QUESTIONS METADATA CAN ANSWER
Written
By: Robert S. Seiner – TDAN.com/ Ciber, Inc.
Published
with Permission from Robert S. Seiner
Built on Metadata
The world of information
technology has "grown-up" dramatically in the last fifteen years -
the term of my comparably short career.
From the days of punching cards and feeding deck readers at
But not even close … One
can only imagine what the next fifteen years have in store for us. Post-Y2k and for the foreseeable future, the
need and speed to manage data, information, and knowledge will (if it has not
already) become the business driver.
"Managing data, information, and knowledge will be the business
driver."
The above is a phrase
worth repeating several times if you don't already know it. A company's ability to manage data,
information, and knowledge will determine how successful a company can be; or
whether or not they can be successful at all.
To manage data, information, and knowledge,
companies need to know what data they have.
Companies need to know precisely how their data is being used and
how that data can be used to create competitive
advantage. To know these things, a
company needs to manage and use its metadata.
Metadata is information,
documented in IT tools, that improves both business and technical
understanding, of data and data-related processes. 1 This definition is significantly longer
then the "data about data" definition that is overused by folks in
our industry. When you break this
definition into pieces it tells us what metadata is, where it can be found, how
it can be helpful, and whom it will help.
Metadata will become
increasingly important over the next fifteen years. Metadata will no longer be the
"Wednesday's child of information processing systems"2 as stated by
the father of data warehousing, Bill Inmon, in Data Management Review.
Every company has
metadata. There is no question about
that. Databases are built on
metadata. Data models are built on
metadata. Programs, screens, reports,
queries, data movement … all of the components of information systems are built
using metadata. This, on it's own, should make it obvious that managing metadata is
important. But it doesn't.
Metadata Questions
Questions are still raised
about metadata. What exactly is
metadata? How much will it cost to
manage metadata? How do I justify the "investment"
in metadata? Who uses metadata? How does one get started managing
metadata? These are all very important
questions with the answers becoming a key determinant of whether or not a
company will proceed with a metadata management strategy and implementation
plan.
These questions are not
always easy to answer - specifically in the case that the person asking the
questions is somewhat separated from the daily building of the data and
technical architectures to support the enterprise - namely the person most
likely to flip the bill to pay for the effort.
Experts have written volumes that answer these questions. These questions will not be addressed in this
article. Instead of answering these
questions, I choose to take a different approach. Instead of focusing on "answers" to
metadata questions, this article will focus on the "questions" that
metadata can answer.
Questions Categories
The "questions
metadata can answer" fall into ten categories. I selected these ten categories because it is
a good round number. There was really no
reason other than that this is a logical breakdown of metadata that I have used
before. If these categories do not suit
your needs, organize your own according to the requirements of your
organization. The ten categories I
selected include:
·
Application
Component Metadata
·
Data
Access / Reporting Metadata
·
Computer
Operations Metadata
Reading the Questions
While you are reading the
list of "questions metadata can answer", ask yourself three simple
questions:
In your current
environment:
Can my company answer these questions?
What is it costing my company to answer these
questions?
What is the result of not being able to answer
these questions?
I think you will be
surprised at how easy it is justify metadata management if you can look at your
answers to the three questions listed above regarding the Questions Metadata
Can Answer.
Many of the questions fall
under multiple categories. For example -
during data movement, data flows from source to target. The action that is taken (value assigned) to
the target may come from a map list (or conversion table) depending on the
source or several sources. The action
that is taken when source data
is missing or source values do not have an assigned
target values (sometimes known as a missing rule) can be considered data
movement metadata or data quality metadata.
I list questions once thinking that you can draw the connection if
necessary. The questions should not be
considered all encompassing. Rather
consider the metadata questions as a "starter kit" that can assist
your company to understand that:
·
The answers to
these questions are important.
·
The answers to
these questions are NOT always available.
·
The IT
division will "perform" better if they have access to this
information.
·
"Cost
savings" and "competitive advantages" are associated with
managing data through metadata.
Questions Metadata Can Answer
Database metadata
describes the physical data. Database
metadata is typically stored in the database catalog or in copybook/segment
definitions and is accessed by developers and database administrators using
database or file-aid type tools.
·
Does the data
exist in a database (or a flat/sequential file)?
·
What databases
exist?
·
What is the
physical name of the database where the data is stored?
·
Where is the
data located? Platform (or dbms),
server?
·
What are the
names of the tables in the database?
·
What columns
are on the tables?
·
What is the
primary key?
·
What other
indexes exist?
·
How is this
table related to other tables?
·
Is this table
part of any views?
·
When was the database last updated?
·
Who last
updated the data?
·
What flat and
sequential files exist?
·
What is the
physical name of the dataset where my data is stored?
·
Where is the
data located? Mainframe, region, dataset
name?
·
How many
generations of the data exist?
·
Do the
datasets exist on tape or on storage?
·
What copybooks
represent the data in the file?
·
What programs
use the copybook?
·
What job
streams execute the programs?
·
How is the
data processed, combined, sorted?
·
much more ...
Data Model metadata
describes the logical design of the data and the mapping from the logical
design to physical data. Data model
metadata can also include business rules, entity relationships, and domain
values. Data model metadata is typically
found in data modeling and CASE tools although some may still track this
information in diagram and spreadsheet tools.
·
What data
models exist?
·
Where can the
models be found?
·
Is there an
enterprise data model?
·
Who created
the models and for what purpose, project / database?
·
Who is
responsible for keeping the models up to date?
·
What business
entities have been defined and what models do they exist on?
·
Where are the business entities represented in databases-tables,
systems-files?
·
What are the
definitions of the business entities?
·
What
attributes make up these entities?
·
What is the
business definition of the attributes?
·
Do the
attributes have restrictive domains?
·
What are the
allowable values for the attributes?
·
What is the
relationship between the logical data model and the physical data model?
·
Is the
physical data model in synch with the logical data model?
·
Is the
physical data model in synch with the physical database?
·
What maps
exist between entities and tables, attributes and columns?
·
much more ...
Data movement metadata
describes the movement of data from source to target. Data movement metadata includes information
about the selection and extraction of data, mapping, transformation, and
loading of data. Data movement metadata
can be found in ETL or data movement tools, spreadsheets, desktop databases, or
in the logic of the code written to perform the data movement.
·
Where did my
data originate? What system or database
did it come from?
·
What field was
used to populate this data or was the field derived?
·
How was the
data derived? Using calculation,
conditionals, both?
·
In the
derivation, what other data was used?
·
Is the value
of this data dependent on the values of other data? What data and how?
·
Is the target
data allowed to be null?
·
What was done
when data was missing?
·
What action
was taken when source data did not fall within quality guidelines?
·
What action
was taken when the source value was not assigned a mapped target value?
·
What values
can the target data take on?
·
How do these
values map to the previous values?
·
When is the
data moved?
·
Has the data
always "moved" this way or is there a history of changes over time?
·
When did those
changes take place?
·
much more ...
Business Rule metadata
describes how the business operates through the use of its data. Business Rule
metadata describes entity relationships, cardinality, domain rules. that define the use
of data. Business Rule metadata
typically exists in data modeling or CASE tools, or in other forms of
documentation maintained outside of a tool, word processing, and spreadsheet.
·
What is the
relationship between two entities of data in the logical data model?
·
What is the
cardinality between those same entities?
·
What are the
conditions under which a piece of data can take on certain values?
·
What values
can a piece of data take on? What are
the values meanings?
·
How is data
created, updated, deleted?
·
When are rules
established? By whom?
·
much more ...
Data Stewardship metadata
describes who in the organization is accountable for actions taken using
data. Data Stewardship metadata defines
who in the organization defines the data, who in the organization creates,
maintains, and eliminates data, and who consumes the data or directly uses the
data or information in their jobs. Data
Stewardship metadata is not maintained by a lot of companies (yet!) but those
that do manage this type of metadata use desktop databases and
spreadsheets.
·
Who do you
call if you have a question about the data?
·
Who is
responsible for defining, creating, reading, updating, and deleting the data?
·
What
accountabilities go along with the actions those individuals can take with the
data?
·
Who are the
data "consumers" who use the data as part of their job?
·
What
information can be shared within the company?
Outside the company?
·
Who has to
approve reports that are being distributed outside the company?
·
Who is
responsible for assigning acceptable values for the data?
·
How does the
stewardship program relate to the company information policy?
·
What
information exists in the information policy?
·
Where can I
find the information policy?
·
much more ...
Application Component Metadata
Application Component
Metadata describes all objects of an application from data files or tables, to
programs, to scripts and jobs, to screens.
Application Component metadata is a giant cross reference of all of the
components that make up a system and how the components are shared and
re-used. Mainframe cross-reference tools
and desktop tools with repositories often are the place where this information
is stored.
·
What
application components are considered standard re-usable objects?
·
How was this
"re-usable object" determination made?
·
How were these
objects tested and who maintains these objects?
·
What programs
(& data & screens) are part of a system (or process or function) ?
·
What jobs (or
procs or scripts) execute the programs?
·
What data is
used by the programs and jobs? How is
the data used?
·
How is the
data passed from program to program, job to job, system to system?
·
What system is
the data dependant on? What system is
dependent on the data?
·
What programs
and jobs are reused? Where are they
reused?
·
What changes
have been made to the programs and jobs over time?
·
Who wrote the
programs and jobs?
·
Who is
responsible for supporting and maintaining the programs and jobs?
·
What programs
update the data?
·
What reports
display the data? What screens report
the data?
·
What programs
use
·
much more ...
Data Access / Reporting Metadata
Data Access and Reporting
metadata describes how to access the data and which reports have already been
created that can be read or recreated.
Data Access and Reporting metadata may also describe the steps that must
be taken to get authorization to read the data, the description of how the data
can be interpreted, available tools, descriptions of reports. Data Access and Reporting metadata typically
is found within reporting tools and in traditional types of documentation,
(i.e. desktop databases, word processing and spreadsheets).
·
What reports
have been written that use the data?
·
What is the
description of a report?
·
How do I
access the reports?
·
What steps
should be taken to get authorization to use the data?
·
How do the
reports select, organize/sort, group, total and display the data?
·
What data was
used by my report?
·
What reports
use my data?
·
When were the reports last updated?
·
Do I have to
execute the report myself or are the results already available?
·
Where will I
find the results?
·
much more ...
Rationalization metadata
describes standard "corporate accepted" pieces of information and how
those pieces of information are represented or mapped to data captured in the
systems. The standard pieces of
information can be a select list of data elements that have accepted meanings,
histories, and values and/or the standard pieces of information can come from
an enterprise data model. The
Rationalization metadata can describe the degree to which the data elements are
the
same piece of information and the differences. Rationalization metadata is often stored in
repositories and traditional types of documentation.
·
What is the standard (core) elements that exist in the company?
·
What are the
business names and definitions of these elements?
·
How were the
standard elements chosen? By whom?
·
Are the
standard elements verified for re-use?
·
Where do the
standard elements map to existing data?
·
How should the
standard elements be used?
·
much more ...
Data Quality metadata
describes the quality of the data. Data
Quality metadata describes the accuracy confidence level, the change
management, the history of the data values and definitions and how changes over
time affect how data can be understood.
Data Quality describes what actions are taken when data is "bad",
missing, duplicate.
Data quality metadata is tracked, using data quality tools,
repositories, and traditional documentation types.
·
How has the accepted values of the data changed over time?
·
When did the
accepted values change?
·
How has the
definition of the data changed over time?
·
When did the
definition of the data change?
·
What
constitutes "bad" data?
·
What quality
checks were performed against my data?
·
What are the
quality check procedures? Who wrote and
executed them?
·
Who analyzed
the results?
·
With what
level of confidence can I trust my data?
·
What is the
accepted level of confidence before the data is considered "low
quality" data?
·
much more ...
Computer Operations
metadata describes the activities of the data and scheduling center. Computer Operations metadata describes data
storage, tape usage, job operations, server operations, scheduling
dependencies, abend procedures, backup and restore
procedures. Computer Operations metadata
can be found through scheduling systems, storage systems, operating and server
systems.
·
What
operations / jobs are scheduled to run against my data?
·
What types of
data backup and recovery are available?
·
When was the last time my data was backed up, restored, verified?
·
What is the
process for backing up and restoring data?
·
Who is
responsible for backup and recovery?
·
Who has
security privileges to use my data?
Create, Read, Update, Delete?
·
When is the
best time to run a program/report against specific data?
·
What
operations are dependant on data from another process?
·
What are the
actions taken when job or system fails or abends?
·
Who should be
called when a job or system fails?
·
What version
of the software are we running?
·
If licensed,
how many licenses do we have, who is using them?
·
When are the
licenses scheduled to expire?
·
When is the
next release of the software due to be installed?
·
What
changes/enhancements are being made to the software with th
new release?
·
How much disk
storage is available?
·
How much disk
storage is being used? At what rate is
the data growing?
·
Who allocates
storage and should be contacted for questions about disk storage?
·
How are the tape storage headers defined?
·
much more ...