Perhaps you need to streamline your team’s operations, or ensure that multiple teams are working from a single source of truth. Maybe you’re in charge of your department’s data quality, or you have some other data-related responsibility. Whether you’re new to databases or an old pro at SQL, you can likely benefit from a refresher in relational database design.
Big and small organizations alike use relational databases to more efficiently store, manage, and analyze critical information, for purposes as disparate as customer relationship management, content production, product planning, UX research, and many more.
Not all relational databases are created equal, however. A poorly designed database might make it more difficult to access the information that you need or jeopardize the accuracy of your data; in contrast, a well-designed database provides several benefits:
- You can avoid redundant, duplicate, and invalid data. Problematic data can undermine the validity of your database, but you can design your relational database to minimize the risks posed by low-quality data.
- You can avoid situations where you are missing required data. If you can identify ahead of time which types of data are most critical to your workflow, you can structure your database in such a way that it enforces proper data entry, or alerts users when records are missing critical data.
- The database structure is easy to modify and maintain. Workflows rarely stay the same forever, and as such you will likely have to make some adjustments to your core relational data model in the future. Fortunately, a well-designed database ensures that any modifications you make to fields in one table will not adversely affect other tables.
- The data itself is easy to modify. In a similar fashion, a well-designed relational database ensures that modifications made to the values in a given field in one table will not adversely affect other fields in that table.
- It’s easier to find the information that you need. With a consistent, logical database structure (that avoids duplicate fields and tables), it’s much easier to query your database.
- You can spend less time fixing your database and more time doing other kinds of work. The best database is one that you don’t have to worry about.
You could build a house without first finalizing the blueprints, but in doing so, you might end up with a house of questionable structural integrity; similarly, taking some time to think carefully about the design of your relational database before implementing it can save you a lot of trouble in the long run.
All this might seem daunting if you’ve only just begun learning about relational databases, or even if you’ve already built a couple of databases and encountered some challenges. In this tutorial, we’ll cover several design principles that you can follow to help you build better databases.
What is a "well-designed" database?
So clearly, good database design is important when trying to build a database that works for you. But what does it actually mean for a database to be well-designed?
Nerd alert: The first person to devise an abstract model for database management was Edgar Frank Codd, an English computer scientist who joined IBM in the 1940s.
A well-designed database enforces data integrity
Data integrity refers to the overall accuracy, completeness, and consistency of the data in your database; a well-designed database maintains data integrity by implementing the processes and standards proposed during the design phase.
Data integrity includes three specific technical aspects of a relational database’s structure:
- Entity integrity (or table-level integrity) ensures that a table has no duplicate records, and that the values of the table’s primary keys are all unique and not null.
- Domain integrity (or field-level integrity) ensures that the purpose of every field is clear and identifiable, and that the values in each field are valid, consistent, and accurate.
- Referential integrity (or relationship-level integrity) ensures that the relationships between pairs of tables are sound, so that the records in the tables are synchronized whenever data is entered into, updated in, or deleted from either table.
A well-designed database enforces relevant business rules
Every organization does its work a little differently, and as such, each organization has its own unique requirements for its data, also known as business rules. If your company is the size of Microsoft, you’ll use databases differently than if you’re a small video operation.
Still, even among video companies, business rules are rarely the same. Let’s say your video company requires that timestamps for video files be noted in milliseconds, and each video client is assigned an eight-character alphanumeric code.
Ideally your team’s database should be able to enforce your unique business rules; this will ensure that the data coming into your base is both accurate and useful. For example, if the video files in the above example were noted in seconds instead of milliseconds, and then entered into the company database, it could create a major snag in video production—and a mess for a database administrator to clean up later.
How to design your relational database, step by step
If this all sounds unfamiliar or overwhelming, don’t worry—there is a systematic process you can follow that will ensure your relational database management system (or RDBMS) follows good design principles, is well-suited to your organization’s needs, and avoids common pitfalls.
Step 1: define your purpose and objectives
Before beginning your database design journey, it’s worth taking a step back and answering a very important question: “Why am I making this database?” Are you making this database in order to manage business transactions? To store customer IDs? To solve a particular organizational problem? Whatever the case, it’s worth taking the time to identify the intended purpose of the database you’ll be creating.
You may even wish to work together with managers, leadership, and end users to jointly write out a mission statement for your database, like: “The purpose of the Mingei International Museum database is to maintain the data for our art collection,” or “Zenbooth’s database will store all of the data for our manufacturing resource planning.”
Additionally, you should define the objectives that the end users of the database will have: which specific tasks will the end users need to perform in order to accomplish their work? Developing an explicit list of objectives—like “Know the status and location of each of the pieces of art in our collection at all times,” or “Maintain a complete customer table that shows records for each of our clients.” This will help you determine an appropriate structure, or database schema, for your information as you work through this design process.
Step 2: analyze data requirements
Before you begin designing your database, you’ll need to analyze your organization’s data fundamentals and requirements. This might sound intimidating, but all it means is that you’ll be assessing how your team currently does its work, and identifying what kind of data is most important to that work. You can do this by closely examining existing processes and by interviewing team members—both management and end users. Some questions to ask as you conduct your research:
- How is your organization currently collecting data? Are you using spreadsheets? Paper templates? Another database? Whichever of these methods you’re using, find the most complete samples of work that you can, and look through them to find as many different attributes as you can. For example, your editorial calendar might currently be living in a spreadsheet, and have columns for “Author,” “Due Date,” “Editor,” and so on.
- How is your organization currently presenting data? What kinds of reports does your organization use? PDFs? Slide decks? Web pages? Carefully examine any types of presentations that incorporate data from your current data collection methods and use them to identify potential fields.
- How are your team members currently using data? The best way to determine the answers to this question is by talking to team members—both management and end users—to identify their current data use patterns and case studies, as well as any gaps in the current system. You can ask questions like, “What types of data are you currently using?” and have them review the samples you collected. Importantly, these interviews can also illuminate plans for the future growth of the organization, which will give you insight into future information requirements and the type of relational database model that would be the best fit.
Step 3: create a list of entities and a list of attributes
After settling on your organization’s purpose and objectives, and analyzing your data requirements, the next steps are to extract a list of entities and a list of attributes from the body of research you’ve compiled. In the context of relational databases, an entity is an object, person, place, event, or idea—like “clients,” “products,” “projects,” or “sales reps.” Attributes are the defining characteristics of those entities, like “name,” “quantity,” “address,” “phone number,” or “genre.” One way you can think about this is that entities are like nouns, and attributes are like the adjectives that describe those nouns.
Start by picking out entities from your research and putting them on a list. These entities will eventually serve as a guide to help you define your tables later on in the design process, but they will also help you identify the attributes necessary to create your list of fields. For example, if you were developing a talent database for a record label, your entities list might look something like this:
Next, create a separate list containing the relevant attributes for each of the entities you’ve identified, as well as any dependencies the attributes might have on one another. These attributes will define the fields for your tables. Again, for the talent database example, your attributes list might look something like this:
- Artist Name
- Artist Phone Number
- Agent Name
- Agent Phone Number
- Agent Email Address
- Venue Name
- Venue Address
- Gig Dates
Once you’ve collected a preliminary list of attributes, you should go through and refine them to make sure that they accurately represent your organization’s informational needs.
- If multiple attributes have different names but actually represent the same concept, deduplicate them so there’s only one. For example, if you have both “Product No.” and “Product Number” on your list, you should remove one of them.
- If multiple attributes have similar names but actually represent different concepts, rename the attributes to be more specific. For example, you could rename two different “Name” attributes into the more specific “Artist Name” and “Venue Name.”
After refining your lists, it’s a good idea to review these lists that you’ve compiled with some of the team members you interviewed to confirm that you’ve accounted for every necessary type of data. Be sure to take their feedback into consideration and further refine your lists as appropriate.
Step 4: model the tables and fields
After creating your lists of entities and attributes, your task is to use those lists to design the structure of your relational database. Your list of entities will become separate tables in your base, and the list of attributes will become the fields for these tables.
Take your lists and assign each of the attributes to your tables. For example, after we finish assigning our listed attributes to our new tables, our talent management database-in-planning might look something like this:
Next, you need to pick an appropriate primary key field for each table. A primary key is a major component of ensuring data integrity, as it uniquely identifies each record within a table and is used to establish different types of relationships between tables.
Each table’s primary key field should meet the following criteria:
- It must contain unique identifiers. This will prevent you from creating duplicate records and redundancy within a table.
- It cannot contain null values. A null value is the absence of a value, and as such, you cannot use a null value to identify a record.
- It should not be a value that will need to be modified often. Ideally, primary key values will remain relatively static over time and only be changed under rare circumstances.
- It should not be sensitive personal information. Social security numbers, passwords, personal health information (PHI) and other sensitive data types may be unique, but should not be used as primary keys, as doing so makes it much harder to keep this data appropriately secured in a single location.
- Ideally, it uses the table name as part of its own name. While not strictly necessary, having the table name in the primary key field name can make it easier to identify the table from which the primary key field originated. For example, “Employee Name” would be obviously identifiable as coming from the related table, “Employees.”
It might be possible that your preliminary list of fields for a given table does not contain one single field that meets all these criteria. However, you may be able to combine two or more fields to create a separate, unique field which does meet all these criteria. For example, you could combine the values in a “Given Name” field and a “Surname” field to create a third, calculated “Full Name” field using a concatenation formula.
In the event that this strategy still doesn’t work for your purposes, you can always manufacture a new field designated specifically for unique identification codes to serve as your primary key field. Fields like “Product ID” or “Sales Invoice Number” are often created for this purpose.
Let’s return to our talent management database example. For the “Artists” table, the “Artist name” field is already a pretty good candidate for the primary key, as it’s pretty unlikely that your record label will sign two artists with the same name. We can also pick “Venue name” as the primary key for the “Venues” table. For the other tables, however, it would probably be better to make new fields that concatenate values from existing fields. In the “Agents” table, we might make a new field—“Agent full name”—that concatenates the values of the “Agent given name” and “Agent surname” fields.
For the “Gigs” table, an artist could perform at the same venue on multiple occasions, so we should make a new field that gives a unique name to the specific combination of an artist at a venue on a specific date. You could potentially concatenate the name of the artist, the venue, and the date to create values like “2 Linkz at the Gotham City Metro Club, 02/13/2019,” but that can get long and unwieldy fast. Alternatively, you could try creating a new field—“Gig code”—with unique alphanumeric code values (like “E0023”). Whichever approach you take is up to you and your team’s specific needs.
Ready to design a relational database in Airtable? Check out our guide to building a base, where you can use a template as a model or start from scratch.