DataSets

This page is pulled from the database 2024-03-01

In this system, a DataSet serves as a container for storing various types of data, including tables, views, database files, filegroups, CSVs, XLSs, etc. What sets this system apart is the structure of DataPoints within a DataSet. Almost all tables (99%) in this system share a common structure, with identical columns.

This consistency is intentional because the system is designed to model data rather than a specific purpose. The approach involves modeling higher orders of likeness, departing from the conventional spreadsheet-style table structures commonly taught in universities. The focus is on considering abstracted patterns necessary for the database management system, rather than prioritizing ease-of-use. However, this approach enables the creation of tools that ultimately make the system more user-friendly than traditional models.

Key columns in every table include:

PrimaryKey
An incrementing integer for unique identification.
Guid
A unique identifier.
TenantId
A bigint for multitenancy.
ClassifierId (Realm, Class, Family, Type)
Classifies the record.
DataSetNumber
A unique identifier per dataset, identical per row.
Name/Description/JSONName
Descriptions for the thought represented within the table.
KeyName
Enables the creation of readable and migratable code.
DisplayOrder, Sequence
Both are orders of records, with DisplayOrder being alphanumeric and Sequence numeric.
RerouteId and Flag
Used for self-cleansing; helps in fixing bad data as it comes in.
Active, Trusted flags
Indicate the active and trusted states of a record.
Security, Planned, Display, Other flags
Provide row-based security, time-related information, and other attributes.

Despite these common columns, there are slight variations, such as the inclusion of DataSetTypeId in tbl_dataset and DataSetFamilyId in tbl_dataset_type. However, these variations do not deviate from the common set of 19 columns.

The system emphasizes the importance of understanding that the Name and Description describe the thought represented within the table, and they may not necessarily be related to a person's name. Alternative structures are in place to handle subjects with varying structures, accommodating cultural differences in naming conventions.

KeyName facilitates the creation of code that can be easily migrated across systems, providing a more readable and adaptable alternative to traditional numeric identifiers.

DisplayOrder and Sequence serve similar purposes but are executed differently, allowing for flexibility in ordering records based on alphanumeric or numeric criteria.

The RerouteId and Flag, TenantId, and Security flag are essential for data management, multitenancy, and row-based security, respectively.

The system's flexibility extends to offering row-based localized content, allowing for dynamic changes in content based on the user querying the data. All content can be changed based on the user’s pronouns, language preference, time of day, and even the number of animals they currently own.

Planned flags indicate that a row has time attached to it, allowing for the management of time-sensitive data. Examples: Deploying data before its active, data available in a campaign, and rows that represent internal or external processes that need to run.

The thoughtfully designed system addresses challenges and shortcomings across the data industry, resulting in a product that aligns with the evolving needs of data management.