How Is Surrogate Key Generated

  • Jan 31, 2011  A surrogate key is typically a numeric value. Within SQL Server, Microsoft allows you to define a column with an identity property to help generate surrogate key values. Before I talk about the pros and cons of natural and surrogate keys, let me first expand a little more on each type of key.
  • Jan 31, 2011 But this is where the similarity stops. Surrogate keys are similar to surrogate mothers. They are keys that don’t have a natural relationship with the rest of the columns in a table. The surrogate key is just a value that is generated and then stored with the rest of the columns in a record.
  • Generally, a Surrogate Key is a sequential unique number generated by SQL Server or the database itself. The purpose of a Surrogate Key is to act as the Primary Key. There is a slight difference between a Surrogate Key and a Primary Key. Ideally, every row has both a Primary Key and a Surrogate Key.
Made

If you are working on Data warehouse project, than you might have heard lot about surrogate keys. Surrogate keys are widely accepted data warehouse design standard. In this article, we will check data warehouse surrogate key design, advantages and disadvantages.

What are surrogate keys in Data warehouse?

If what you choose is not a nature key, but a system generated identifier, it is called surrogate key. Or we can say that you use a surrogate key' as the primary key. In the avove example, the customerid is a surrogate key and the customernumber is the nature key. These are just terms to describe the table design. Surrogate keys are mostly maintained by the system and used to make several aspects of implementation easier. Common methods for creating surrogate key values in SQL Servers are by using: A monotonically increasing number, such as using an identity column. A globally unique identifier (GUID) data type. Let’s take a look at a few examples below of how we can implement surrogate keys in SQL Server. Feb 28, 2011 Surrogate keys are made up keys and do not appear naturally in the data. In this article Gregory A. Larsen shows you how to generate surrogate keys using an identity column. The goal of this blog post is to identify the key benefits of replacement and surrogate keys. This blog post will also explain how to set up a form using reference groups to refer to a surrogate key field. Last, it will detail a common failure scenario when attempting to create a reference group.

If you are a data warehouse developer, that you might be thinking what is surrogate key? How and where it is being used? You will get answers to all your questions here.

Data warehouse surrogate keys are sequentially generated meaningless numbers associated with each and every record in the data warehouse. These surrogate keys are used to join dimension and fact tables.

  • Usually, database sequences are used to generate surrogate key so it is always unique number
  • Surrogate keys cannot be NULLs. Surrogate key are never populated with NULL values.
  • It does not hold any meaning in data warehouse, often called meaningless numbers. It is just sequentially generated INTEGER number for better lookup and faster joins.

Why surrogate keys are used in Data warehouse?

Basically, surrogate key is an artificial key that is used as a substitute for natural key (NK) defined in data warehouse tables. We can use natural key or business keys as a primary key for tables. However, it is not recommended because of following reasons:

  • Natural keys (NK) or Business keys are generally alphanumeric values that is not suitable for index as traversing become slower. For example, prod123, prod231 etc
  • Business keys are often reused after sometime. It will cause the problem as in data warehouse we maintain historic data as well as current data.

For example, product codes can be revised and reused after few years. It will become difficult to differentiate current products and historic products. To avoid such a situation, surrogate keys are used.

Data Warehouse Surrogate Key examples

Surrogate Keys are integers that are assigned sequentially in the dimension table which can be used as primary key. The surrogate key column could be identity column or database sequences are used.

Below is the sample example of surrogate key:

Surrogate keys key in a database
PATIENT_SKPATIENT_IDPATIENT_NAMEPATIENT_AGE
1P001ABC20
2P002BCD25
3P003CDE19
4P004DEF45

Advantages of Surrogate Key

Below are some of advantages of using surrogate keys in data warehouse:

  • With help of surrogate keys, you can integrate heterogeneous data sources to data warehouse if they don’t have natural or business keys.
  • Joining tables (fact and dimensions) using surrogate key is faster hence better performance
  • Surrogate keys are very helpful for ETL transformations.
  • Data warehouse Surrogate keys are usually small integer numbers that makes smaller index and better performance
  • Surrogate keys are required if you are implementing slowly changing dimension (SCD)

Disadvantages of Surrogate Key

Below are some of disadvantages of using surrogate keys in data warehouse:

  • Surrogate key generation and assignment takes unnecessary burden on ETL framework
  • You should not over use the surrogate keys as they don’t have any meaning in data warehouse tables.
  • Data migration becomes difficult if you have database sequence associated with surrogate key columns. You should carefully take care of number surrogate key generation in new database otherwise you may end up with duplicate surrogate keys.

Related articles

By: Ben Snaidero Updated: 2018-04-16 Comments (5) Related: More >Database Design

Problem

If you polled any number of SQL Server database professionals and asked the question, 'Which is better when defining a primary key, having surrogate or natural key column(s)?', I'd bet the answer would be very close to a 50/50 split. About the only definitive answer you will get on the subject is most people agree that when implementing a data warehouse, you have to use surrogate keys for your dimension and fact tables. This is because a source system can change at any time due to business requirements and your data warehouse should be able to handle these changes without needing any updates. This tip will go through some of the pros and cons of each type of primary key so that you can make a better decision when deciding which one to implement in your own environments.

Solution

Before we get into the pros and cons let's first make sure we understand the difference between a surrogate and natural key.

Surrogate Key Overview

A surrogate key is a system generated (could be GUID, sequence, etc.) value with no business meaning that is used to uniquely identify a record in a table. The key itself could be made up of one or multiple columns. The following diagram shows an example of a table with a surrogate key (AddressID column) along with some sample data. /generate-public-key-file-from-secret-key.html. Notice the key itself has no business meaning, it's just a sequential integer.

Natural Key Overview

A natural key is a column or set of columns that already exist in the table (e.g. they are attributes of the entity within the data model) and uniquely identify a record in the table. Since these columns are attributes of the entity they obviously have business meaning. The following is an example of a table with a natural key (SSN column) along with some sample data. Notice that the key for the data in this table has business meaning.

Since this topic has been debated for years with no definitive answer as to which is better I thought with this tip I would put together a list of all the pros and cons of each type of key. This list can then be used as a reference when deciding what type of key would be best suited for your own environment/application. After all, everyone's requirements are different. What works or performs well in one application might not work so well in another.

Natural Key Pros

  • Key values have business meaning and can be used as a search key when querying the table
  • Column(s) and primary key index already exist so no disk extra space is required for the extra column/index that would be used by a surrogate key column
  • Fewer table joins since join columns have meaning. For example, this can reduce disk IO by not having to perform extra reads on a lookup table

Database Surrogate Key

Natural Key Cons

Surrogate Key Definition

  • May need to change/rework key if business requirements change. For example, if you used SSN for your employee as in the example above and your company expands outside of the United States not all employees would have a SSN so you would have to come up with a new key.
  • More difficult to maintain if key requires multiple columns. It's much easier from the application side dealing with a key column that is constructed with just a single column.
  • Poorer performance since key value is usually larger and/or is made up of multiple columns. Larger keys will require more IO both when inserting/updating data as well as when you query.
  • Can't enter record until key value is known. It's sometimes beneficial for an application to load a placeholder record in one table then load other tables and then come back and update the main table.
  • Can sometimes be difficult to pick a good key. There might be multiple candidate keys each with their own trade-offs when it comes to design and/or performance.

Surrogate Key Pros

  • No business logic in key so no changes based on business requirements. For example, if the Employee table above used a integer surrogate key you could simply add a separate column for SIN if you added an office in Canada (to be used in place of SSN)
  • Less code if maintaining same key strategy across all entities. For example, application code can be reused when referencing primary keys if they are all implemented as a sequential integer.
  • Better performance since key value is smaller. Less disk IO is required on when accessing single column indexes.
  • Surrogate key is guaranteed to be unique. For example, when moving data between test systems you don't have to worry about duplicate keys since new key will be generated as data is inserted.
  • If a sequence used then there is little index maintenance required since the value is ever increasing which leads to less index fragmentation.

Surrogate Key Cons

  • Extra column(s)/index for surrogate key will require extra disk space
  • Extra column(s)/index for surrogate key will require extra IO when insert/update data
  • Requires more table joins to child tables since data has no meaning on its own.
  • Can have duplicate values of natural key in table if there is no other unique constraint defined on the natural key
  • Difficult to differentiate between test and production data. For example, since surrogate key values are just auto-generated values with no business meaning it's hard to tell if someone took production data and loaded it into a test environment.
  • Key value has no relation to data so technically design breaks 3NF
  • The surrogate key value can't be used as a search key
  • Different implementations are required based on database platform. For example, SQL Server identity columns are implemented a little bit different than they are in Postgres or DB2.

Summary

As mentioned above it's easy to see why this continues to be debated. Each type of key has a similar number of pros and cons. If you read through them though you can see how based your requirements some of the cons might not even apply in your environment. If that's the case then it makes it much easier to decide which type of key is the best fit for your application.

Next Steps
  • Read more tips on SQL Server constraints
  • Read other tips on data warehousing
  • Read more information auto generated keys in SQL Server

Last Updated: 2018-04-16



About the author
Ben Snaidero has been a SQL Server and Oracle DBA for over 10 years and focuses on performance tuning.
View all my tips

Surrogate Keys Key In A Database


How Is Surrogate Key Generated Florida