Fulton Wilcox on Application of Practical Nominalism to Data Management at the MIT Information Quality Industry Symposium, July 17, 2008

On July 17, 2008, I will be speaking at the  MIT Information Quality Industry Symposium event on the subject of the "Application of Practical Nominalism to Data Management." For information on this Symposium, see  http://mitiq.mit.edu/IQIS/2008/  From a quality perspective, this talk addresses the matter of "fitness for use" of data. It leverages two principal themes: the philosophic nominalists’ emphasis of particularity over abstraction, and the Von Neumann architecture bifurcating "logic" from "data." The practical aspect of the talk is to emphasize the utility of an architecture in which "rules engines" run against granular, highly particular data, as opposed to relying on a models-oriented architecture which intermixes rules and conventions with raw data in ways that make it difficult to meet diverse information requirements, regulatory requirements and access control requirements. This somewhat counter-culture talk  is of particular relevance to cross entity data sharing – e.g., intelligence, healthcare, etc. in which "models" and ontologies cannot scale to encapsulate the real-world complexities involved.

Abstract
Application of Practical Nominalism to Data Management

Many information quality problems have as a root cause an over-reliance on the ontological notion that "entities" are "real" while events and transactions are merely transitory manifestations of "real" entities in action. The nominalist position is that an "entity" such as the "Massachusetts Institute of Technology" are not "real," but merely a name tagging a flow of transactions and events, and what the entity "is" by definition differs from day to day. Nominalism has been used to explain the "King Canute" impediments to creating taxonomies and ontologies : e.g., just as the taxonomy is defined, more events and transactions flood in to put it into disarray.

On the other hand,  there is a more positive perspective. Our capability to improve data quality will benefit if we exploit the growing power of our technology to run systems processes directly against transaction data and event data, and as a corollary, minimize reliance on "synthesized" data. Synthesized data looks "real" and may even look like an event, but it in fact has been synthesized by the application of rules and conventions to genuine transaction and event data. For example, a reported number of MIT employees is inherently "synthetic" data, because it fuses "realist" notions of what constitutes "MIT," what constitutes employment, what very detailed rules apply (hours per week) to transactions, and many others.

The evolution of technology favors this nominalist approach, because of increased processing capacity, the creation of SOA (service oriented architecture), rules engines and network services. The nominalist design approach also is liberating in that informational "gold" – our raw data – will not be held captive in synthetic outputs. It also greatly assists in supporting privacy, security and due process, because it becomes far easier to isolate data.