If you need to streamline your team’s operations, ensure that multiple teams in your org are working from a single source of truth, maintain a canonical inventory, or perform any other data-related responsibility, your team can benefit from a relational database.
Big and small organizations alike use relational databases to more efficiently store, manage, and analyze critical information, for purposes as disparate as customer relationship management, content production, product planning, UX research, and many more.
Not all relational databases are created equal, however. A poorly designed database might make it more difficult to access the information that you need or jeopardize the accuracy of your data; in contrast, a well-designed database provides several benefits:
- You can avoid redundant, duplicate, and invalid data. Problematic data can undermine the validity of your database, but you can design your relational database to minimize the risks posed by low-quality data.
- You can avoid situations where you are missing required data. If you can identify ahead of time which types of data are most critical to your workflow, you can structure your database in such a way that it enforces proper data entry, or alerts users when records are missing critical data.
- The database structure is easy to modify and maintain. Workflows rarely stay the same forever, and as such you will likely have to make some adjustments to your database structure in the future. Fortunately, a well-designed relational database ensures that any modifications you make to fields in one table will not adversely affect other tables.
- The data itself is easy to modify. In a similar fashion, a well-designed relational database ensures that modifications made to the values in a given field in one table will not adversely affect other fields in that table.
- It’s easier to find the information that you need. With a consistent, logical database structure (that avoids duplicate fields and tables), it’s much easier to query your database.
- You can spend less time fixing your database and more time doing other kinds of work. The best database is one that you don’t have to worry about.
You could build a house without first finalizing the blueprints, but in doing so, you might end up with a house of questionable structural integrity; similarly, taking some time to think carefully about the design of your relational database before implementing it can save you a lot of trouble in the long run.
All this might seem daunting if you’ve only just begun learning about relational databases, or even if you’ve already built a couple of databases and encountered some challenges. Fortunately, there are several design principles you can follow that will help you build better databases.
What is a "well-designed" database?
So clearly, good database design is important when trying to build a database that works for you. But what does it actually mean for a database to be well-designed?
A well-designed database enforces data integrity
Data integrity refers to the overall accuracy, completeness, and consistency of the data in your database; a well-designed database maintains data integrity by implementing the processes and standards proposed during the design phase.
Data integrity includes three specific technical aspects of a relational database’s structure:
- Entity integrity (or table-level integrity) ensures that a table has no duplicate records, and that the values of the table’s primary keys are all unique and not null.
- Domain integrity (or field-level integrity) ensures that the purpose of every field is clear and identifiable, and that the values in each field are valid, consistent, and accurate.
- Referential integrity (or relationship-level integrity) ensures that the relationships between pairs of tables are sound, so that the records in the tables are synchronized whenever data is entered into, updated in, or deleted from either table.
A well-designed database enforces relevant business rules
Every organization does its work a little differently, and as such, each organization has its own unique requirements for its data, also known as business rules. Some hypothetical examples might be: a video production company requires that the duration timestamps for its video files be stored in milliseconds; a consumer brand requires that each product in its catalog be assigned a unique eight-character alphanumeric code; a university requires that its students sign up for no more than six classes per semester.
Ideally your team’s database should be able to enforce your unique business rules; this will ensure that the data coming into your base is both accurate and useful. For example, if the university in the previous example forgot to account for their business rules when building their database, a student might try to sign up for 20 classes in a single semester—creating a mess for a database administrator to clean up later.
How to design your relational database, step by step
If this all sounds unfamiliar or overwhelming, don’t worry—there is a systematic process you can follow that will ensure your relational database follows good design principles, is well-suited to your organization’s needs, and avoids common pitfalls.
Step 1: define your purpose and objectives
Before beginning your database design journey, it’s worth taking a step back and answering a very important question: “Why am I making this database?” Are you making this database in order to manage business transactions? To store information? To solve a particular organizational problem? Whatever the case, it’s worth taking the time to identify the intended purpose of the database you’ll be creating.
You may even wish to work together with managers, leadership, and end users to jointly write out a mission statement for your database, like: “The purpose of the Mingei International Museum database is to maintain the data for our art collection,” or “Zenbooth’s database will store all of the data for our manufacturing resource planning.”
Additionally, you should define the objectives that the end users of the database will have: which specific tasks will the end users need to perform in order to accomplish their work? Developing an explicit list of objectives—like “Know the status and location of each of the pieces of art in our collection at all times,” “Produce monthly reports on customer sales,“ or “Maintain complete records for each of our clients”—will help you determine an appropriate structure for your database as you work through this design process.
Step 2: analyze data requirements
Before you begin designing your database, you’ll need to analyze your organization’s data requirements. This might sound intimidating, but all it means is that you’ll be assessing how your team currently does its work, and identifying what kind of data is most important to that work. You can do this by closely examining existing processes and by interviewing team members—both management and end users. Some questions to ask as you conduct your research:
- How is your organization currently collecting data? Are you using spreadsheets? Paper templates? Another database? Whichever of these methods you’re using, find the most complete samples of work that you can, and look through them to find as many different attributes as you can. For example, your editorial calendar might currently be living in a spreadsheet, and have columns for “Author,” “Due Date,” “Editor,” and so on.
- How is your organization currently presenting data? What kinds of reports does your organization use? PDFs? Slide decks? Web pages? Carefully examine any types of presentations that incorporate data from your current data collection methods and use them to identify potential fields.
- How are your team members currently using data? The best way to determine the answers to this question is by talking to team members—both management and end users—to identify their current data use patterns as well as any gaps in the current system. You can ask questions like, “What types of data are you currently using?” and have them review the samples you collected. Importantly, these interviews can also illuminate plans for the future growth of the organization, which will give you some insight into future information requirements.
Step 3: create a list of entities and a list of attributes
After settling on your organization’s purpose and objectives, and analyzing your data requirements, the next steps are to extract a list of entities and a list of attributes from the body of research you’ve compiled. In the context of relational databases, an entity is an object, person, place, event, or idea—like “clients,” “products,” “projects,” or “sales reps.” Attributes are the defining characteristics of those entities, like “name,” “quantity,” “address,” “phone number,” or “genre.” One way you can think about this is that entities are like nouns, and attributes are like the adjectives that describe those nouns.
Start by picking out entities from your research and putting them on a list. These entities will eventually serve as a guide to help you define your tables later on in the design process, but they will also help you identify the attributes necessary to create your list of fields. For example, if you were developing a talent database for a record label, your entities list might look something like this:
Next, create a separate list containing the relevant attributes for each of the entities you’ve identified, as well as any other attributes that might have come up during your research. These attributes will define the fields for your tables. Again, for the talent database example, your attributes list might look something like this:
- Artist Name
- Artist Phone Number
- Agent Name
- Agent Phone Number
- Agent Email Address
- Venue Name
- Venue Address
- Gig Dates
Once you’ve collected a preliminary list of attributes, you should go through and refine them to make sure that they accurately represent your organization’s informational needs.
- If multiple attributes have different names but actually represent the same concept, deduplicate them so there’s only one. For example, if you have both “Product No.” and “Product Number” on your list, you should remove one of them.
- If multiple attributes have similar names but actually represent different concepts, rename the attributes to be more specific. For example, you could rename two different “Name” attributes into the more specific “Artist Name” and “Venue Name.”
After refining your lists, it’s a good idea to review these lists that you’ve compiled with some of the team members you interviewed to confirm that you’ve accounted for every necessary type of data. Be sure to take their feedback into consideration and further refine your lists as appropriate.
Step 4: model the tables and fields
After creating your lists of entities and attributes, your task is to use those lists to design the structure of your relational database. Your list of entities will become the different tables in your base, and the list of attributes will become the fields for these tables.
Take your lists and assign each of the attributes to your tables. For example, after we finish assigning our listed attributes to our new tables, our talent management database-in-planning might look something like this:
Next, you need to pick an appropriate primary key field for each table. A primary key is a major component of ensuring data integrity, as it uniquely identifies each record within a table and is used to establish relationships between different tables.
Each table’s primary key field should meet the following criteria:
- It must contain unique values. This will prevent you from creating duplicate records within a table.
- It cannot contain null values. A null value is the absence of a value, and as such, you cannot use a null value to identify a record.
- It should not be a value that will need to be modified often. Ideally, primary key values will remain relatively static over time and only be changed under rare circumstances.
- It should not be sensitive personal information. Social security numbers, passwords, personal health information (PHI) and other sensitive data may be unique, but should not be used as primary keys, as doing so makes it much harder to keep this data appropriately secured in a single location.
- Ideally, it uses the table name as part of its own name. While not strictly necessary, having the table name in the primary key field name can make it easier to identify the table from which the primary key field originated. For example, “Employee Name” would be obviously identifiable as coming from the “Employees” table.
It might be possible that your preliminary list of fields for a given table does not contain one single field that meets all these criteria. However, you may be able to combine two or more fields to create a separate, unique field which does meet all these criteria. For example, you could combine the values in a “Given Name” field and a “Surname” field to create a third, calculated “Full Name” field using a concatenation formula.
In the event that this strategy still doesn’t work for your purposes, you can always manufacture a new field designated specifically for unique identification codes to serve as your primary key field. Fields like “Product ID” or “Sales Invoice Number” are often created for this purpose.
Let’s return to our talent management database example. For the “Artists” table, the “Artist name” field is already a pretty good candidate for the primary key, as it’s pretty unlikely that your record label will sign two artists with the same name. We can also pick “Venue name” as the primary key for the “Venues” table. For the other tables, however, it would probably be better to make new fields that concatenate values from existing fields. In the “Agents” table, we might make a new field—“Agent full name”—that concatenates the values of the “Agent given name” and “Agent surname” fields. For the “Gigs” table, an artist could perform at the same venue on multiple occasions, so we should make a new field that gives a unique name to the specific combination of an artist at a venue on a specific date. You could potentially concatenate the name of the artist, the venue, and the date to create values like “2 Linkz at the Gotham City Metro Club, 02/13/2019,” but that can get long and unwieldy fast. Alternatively, you could try creating a new field—“Gig code”—with unique alphanumeric code values (like “E0023”). Whichever approach you take is up to you and your team’s specific needs.
Step 5: establish table relationships
Once you’ve identified your tables, fields, and primary key fields, you can start the process of linking them all together. Using your knowledge gained from your prior research and conversations with other team members, you can make a preliminary attempt to define the logical relationships between your different tables.
Relationships between tables are created by linking together primary key fields and foreign key fields. A foreign key is a field in one table that references the primary key of another table. For example, if you were designing a sales pipeline database, your “Deals” table might contain a “Sales Rep” foreign key field, which would link to your “Sales Reps” table; if you were designing a content calendar database, your “Pieces” table might contain an “Editor” foreign key field and an “Author” foreign key field, both of which would link to your “Staff” table.
To start defining the relationships between your different tables, identify the potential foreign key fields. In an ideal relational database, foreign key fields are the only fields that should ever be duplicated, so if you identify any fields that appear in multiple tables, those are likely candidates for foreign key fields.
Returning to our talent agency example, we can go through our lists of fields per table and identify the fields that share names/concepts with the primary key fields of other tables:
There are three types of relationships between tables:
- One-to-one relationships, in which a record in one table is related to one and only one record in another table. An example of this might be an IT team’s asset tracking database with an “Employees” table and a “Computers” table: each employee only possesses one company-owned computer, and each company-owned computer is possessed by only one employee. (Note that one-to-one relationships are relatively uncommon.)
- One-to-many relationships, in which a record in one table can be related to one or more records in another table. An example of this might be a project tracker database which contains a “Projects” table and a “Tasks” table: each project has multiple associated tasks, but each task is only associated with one project.
- Many-to-many relationships, in which one or more records in one table can be related to one or more records in another table. An example of this might be an inventory of books, which contains a “Titles” table and an “Authors” table: each title will have been written by one or more authors, and each author can have written one or more books.
Being able to identify the different types of relationships is helpful for modeling your organization’s business rules. For example, if you specifically define a relationship as one-to-many, you can enforce a rule that the records on the “many” side of the relationship can, indeed, only ever be linked to one record in the other table. (More on this in the next step.)
One activity that can be very helpful for visualizing how your tables will relate to each other is creating an entity-relationship diagram, or an ER diagram. An ER diagram is a kind of chart that uses shapes to represent your tables and lines to represent the relationships between your tables.
You might be wondering why you would want to go through the effort of establishing the relationships between these tables. The main reason is that the table relationships actually allow you to make logical connections between pairs of tables, which in turn allows you to draw data from multiple tables simultaneously. Being able to draw data from multiple tables simultaneously means that you can make your tables more efficient and minimize redundant data. If you construct your table relationships appropriately, you can see whatever data you need to see from any table, at any time, but you’ll only ever need to enter it or modify it in a single location.
To better understand how this might work in practice, let’s return once again to our talent agency example. Suppose you wanted to be able to see at a glance how much revenue each artist has generated in total across all their gigs. Because you’ve already established a relationship between the “Gigs” and the “Artists” table, you can use that relationship to look up data from all the relevant gig records per artist.
To do this, you can create a new computed field in the “Artists” table—“Total gig revenue”—that sums the total value of “Revenue generated per gig” for all linked records in the “Gigs” table per artist. A computed field is a special kind of field that automatically generates a value using one or more values from another field in the database. Because the computed field updates automatically, if you ever edit a value in the “Revenue generated per gig” field or add any new gig records for a particular artist, the “Total gig revenue” field will also update automatically.
You can even take this one step further: if you wanted to see how much revenue each agent has generated in total through all the artists they manage, then you could use the established relationship between the “Artists” and the “Agents” table and create a new computed field in the “Agents” table—“Artists’ total gig revenue”—that sums the total value of “Total gig revenue” for all linked records in the “Artists” table per agent. Since this new computed field in the “Agents” table is connected to the “Gigs” table via the “Artists” table, it will also be updated automatically if the values in the “Gigs” table change, even though there is no direct relationship between the “Gigs” and “Agents” tables.
Step 6: establish business rules
After fully establishing the overall structure of your database, it’s time to incorporate your organization’s unique business rules into your database design. Because business rules are so closely tied to the specific ways in which your organization does its work, your most valuable resource will be, once again, managers and end users. Consulting with them can provide invaluable insight into which constraints will be the most impactful for your workflows.
Generally speaking, there are two main kinds of business rules. Field-specific business rules refer to constraints placed on specific fields. Some examples of field-specific business rules might be “Dates on our product order invoices must be displayed in the ISO format ‘2020-12-22,’” “Email addresses stored in the employee directory must be valid email addresses,” or “The only valid values that can be selected for this status field are ‘To do,’ ‘Doing,’ and ‘Done.’”
Relationship-specific business rules, which we briefly touched on earlier, refer to constraints placed on table relationships. Some examples of relationship-specific business rules might be “Each project in our video production tracker must be linked to one or more fact-checkers,” or “Each line item in a customer’s order must be linked to one and only one product.”
The most thorough method of identifying and implementing your field-specific business rules is to systematically review each field within each table to determine which business rules apply to that field. Every field will have some relevant business rules, even if those rules are as general as “Every value in the ‘Employee first name’ field should be a string composed of letters.” In a similar fashion, you can systematically review each of the relationships in your proposed database structure and assess whether or not they require any relationship-specific business rules.
In the future, you will probably need to change your existing business rules or add new business rules. For example, your team might decide that “Canceled” should be a valid potential value for a task status field that currently includes only “To do,” “Doing,” and “Done” as options. Fortunately, if you have followed the previous steps and set up a robust underlying database structure, it should be relatively easy to adjust your field- and relationship-specific business rules down the road without having to restructure your entire database.
Step 7: check your work
You’re almost done with the database design process—all you need to do is perform one final review of your database. A few questions to ask yourself as you look over every table, field, and relationship:
- Does every type of entity have its own dedicated table?
- Are there any tables that might need to be consolidated, or, conversely, decomposed into multiple tables?
- Are there any duplicate fields in my database that are not foreign keys?
- Do each of my fields have defined specifications that match our organization’s business rules?
- Do the relationships between my tables make sense?
If everything looks good to you, you should run the final product past the end users and managers who will be interacting with the database. If they’re satisfied with the end result, then rejoice! Be proud of your new, structurally sound database.