User-Defined DataTypes: The Right Way

In our system, we refer to DataTypes as DataPointTypes, and a DataPoint is essentially an abstracted name for a column in a table. The goal is to build a system that can support various industries, companies, and data types. To accommodate this diversity, we need to define what data is and what the system requires to operate effectively.

If you're working in an environment that makes it challenging to declare user-defined datatypes (like MySQL), it becomes crucial to ensure that all tables are automatically created.

Now, let's break down the concept of DataPointTypes. By defining these types, we maintain consistency in the system (ensuring that the length of a name in one table isn't different in another) and provide ourselves with a tool for self-documentation.

In our system, about 99% of tables will have identical columns. This is intentional to avoid altering tables every time new fields are needed, a topic we'll delve into further in the DataSet portion.

To illustrate, let's consider a common "PERSON" table with columns like Id, FirstName, LastName, and BirthDate. Compare this with our system's "tbl_person" and "tbl_person_name" tables, which have about 20 columns each (though the examples are simplified).

And our tables:

The system aims to make working with these tables easy, eliminating the need to write SQL for reports. This is particularly relevant in the age of machine learning and artificial intelligence, where common models might break, and we're designing a system to handle anything and everything.

For instance, a person's name could be as long as Pablo Picasso's full name. To adapt to such scenarios, we need to change how we think about - and classify - data. BirthDate becomes an Event, and actions like login and logout are also considered events.

In this mindset, a first and last name don't uniquely identify a person, so they don't belong in tbl_person. Instead, a person is viewed as a thought, and everything fits into the same abstracted container, emphasizing relationships between thoughts and the system's data management requirements.

Now, let's categorize DataPointTypes into Realms, Classes, and Families:

  • Realms

    UNKNOWN, UNASSIGNED, DATE, STRING, NUMERIC.

  • Classes

    UNKNOWN, UNASSIGNED, DATE, DATETIME, JSON, TEXT, INTEGER, DECIMAL.

  • Families

    SYSTEM, DESCRIPTOR, KEY, EXTENSION, and RCFT

DataPointTypes are essentially user-defined types or domains (in the context of Postgres). They relate to native SQL types but are created as specific domains. Examples include PRIMARY KEY, ROW GUID (unique identifier), TENANT (foreignkey), REALM (foreignkey), CLASS (foreignkey), FAMILY (foreignkey), TYPE (foreignkey), NAME (nvarchar), DESCRIPTION (nvarchar), ACTIVE (bit), TRUSTED (bit), REROUTE (bit), DISPLAY (bit), PLANNED (bit), SECURITY (bit), among others.

In our system, a DataPoint is a column in a DataSet, functioning like an object with more than just a value. It has a type, realm, and information about the DataSet, Database, and their relationships within the system.

This shift in mindset is crucial; what used to be a single value (x,y), now includes a rich source of information about its place in the system.

Copyright © 2020 - Elric Sims