In this page, we will motivate the creation of good definitions, describe what makes a good definition, and outline some helpful steps for creating good definitions.
Motivating good definitions: Why do we care?
Good definitions are well worth the effort. In fact, they are central to the effort of data governance, and creating and approving them is one of the major tasks for data governance teams. A good definition, once developed, provides a clear picture of what the data asset is (and, by extension, what it isn't). A good definition precludes the kinds of contradictions and ambiguities that create problems for the interpretation and organization of data, especially large amounts of data across numerous databases.
Characteristics of good definitions
Good definitions are:
- Specific and disambiguating
- Concise
- Accessible to the full breadth of the target audience
Philosopher H. Paul Grice wrote extensively about how to package information for successful communication. You could say that good definitions follow his Maxims of Conversation.1 Read a summary here.
Specific and disambiguating
A good definition, especially for the purposes of data governance, most crucially picks out the central identifying characteristic(s) of an asset (specific), which include the characteristic(s) that set(s) that asset apart from others (disambiguating).
Concise
A good definition is as brief as possible, while still being specific and disambiguating. In other words, a good definition cuts out all text that does not contribute to the unambiguous identification of an asset.
See below, #Helpful steps for creating good definitions, for how to identify text to cut out of a definition.
Accessible to the full breadth of the target audience
A good definition can be read and understood with ease by any user. Good definitions are written such that their meaning is easily grasped by a naive educated reader (e.g., an external consultant unfamiliar with project specifics). Including all readers extends the reach and power of the definition.
Helpful steps for creating good definitions
In this section, we'll go over a few different skills for definition writing, including things to do and things to avoid.
Things to do
Seek out unique characteristic(s)
When writing a definition, consider the asset to be defined and ask yourself, what is it that makes this unique from other similar assets? This characteristic or set of characteristics should be central to your definition.
Write true and relevant definitions
If a definition is false, it is worse that useless: it is also misguiding. Errors happen, sometimes through typos and sometimes because of a lack of information. In both cases, false definitions can be caught by asking many people to read over a definition in order to approve it. This is why definition approval and data quality management are central to data governance!
Use phrases like "is a", "has a", and "used for"
Definitions are stronger when they identify the type or class of the asset ("is a"), provide central identifying characteristics of the asset ("has a"), and note the general function of the asset ("used for").
For more, see Stanford University Data Stewardship's Data Definitions Best Practices.2
Things to avoid
Nested definitions
When writing a definition, do not include nested definitions (definitions within a definition). This adds unnecessary complexity to the definition.
Collibra's Hyperlinking feature provides an easy alternative: a link to another defined asset provides readers with a simple way to learn more about an unknown term within a definition.
Lists
The purpose of a definition is to pinpoint the meaning of a concept, not to provide every possible example of that concept. Lists of examples should be avoided, as should lists of descriptors, unless they are key to a specific and unambiguous definition.
If the listed examples are also assets, they can be linked to the defined asset using a relation. See Relation Types.
Using a synonym as a definition
A definition should describe an asset. Synonyms do not describe the characteristics of an asset, but only provide another name for that asset. This is not a definition.
Circular definitions
A definition should not include the name of the asset in the definition. Importantly to the development of a business semantics glossary or other repository of definitions, two assets should not be defined in such a way that each refers to the other for its definition. Instead, look for the unique characteristics of the asset and use those to define it.
Obscure or overly technical language
Definitions that rely on technical language or make assumptions about a reader's knowledge base should be avoided. Wherever possible, replace technical language or jargon with simple explanations.
1 Grice, H. Paul (1975). "Logic and conversation". In Cole, P. and Morgan, J. Syntax and semantics. 3: Speech acts. New York: Academic Press. pp. 41–58.
2 Stanford University Data Stewardship website: http://dg.stanford.edu