Eden Ridgway's Blog

.Net and Web Development Information

  Home :: Contact :: Syndication  :: Login
  105 Posts :: 1 Stories :: 78 Comments :: 3 Trackbacks

Search

Article Categories

Archives

Post Categories

Development

General

When clearing database tables, one knows that certain tables must be deleted before others, but others can be deleted at any point. Hence we want to establish a partial order of the delete operations in our cleanup script. A Topological Sort algorithm satisfies this requirement. As the name suggests it evaluates the interconnected nodes [in our case tables] to determine which come first.

So if we were to look at the database structure on the right, we would want the UserTeam table cleared before the Team and User tables. The Document table would also need to be cleared before we could clear out the user table. Hence any of the following clearance orders would satisfy our needs:

  1. UserTeam, Document, User, Team
  2. Document, UserTeam, Team, User
  3. Document, UserTeam, User , Team
  4. UserTeam, Team, Document, User

I think the list above makes the concept of a partial order pretty clearly since no one order is better than the other. Obviously as one introduces more dependencies so the order will change and the number of combinations grows significantly. The list however is actually the inverse of a topological sort since it starts of with the tables that depend on other tables and ends with tables that have no dependencies.

In order to sort the relationships correctly we will have to traverse down the relationship tree until we find a table that has no relationships and then add it to our sorted list. We then pick another table and do the same until there are no unprocessed tables. The flowchart below illustrates the logic that I will use to implement the sort.

Doing this in TSQL is actually quite easy and I'll highlight the code I used to implement the flowchart logic:

  1. Compile a list of tables
    CREATE TABLE #Tables (
    TableName
    varchar(100)
    )

    INSERT INTO #Tables (
    TableName
    )
    SELECT
    Table_Name
    FROM
    INFORMATION_SCHEMA.TABLES
    WHERE
    TABLE_TYPE = 'BASE TABLE'
  2. Get a list of foreign key relationships

    CREATE TABLE #Dependencies (
    PKTable
    varchar(100),
    FKRefTable
    varchar(100)
    )

    INSERT INTO #Dependencies (
    PKTable,
    FKRefTable
    )
    SELECT
    PKTABLE.TABLE_NAME as PKTable,
    FKTABLE.TABLE_NAME
    as FKRefTable
    FROM
    INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS REFCON

    INNER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS FKTABLE ON
    FKTABLE.CONSTRAINT_NAME = REFCON.CONSTRAINT_NAME

    INNER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS PKTABLE ON
    PKTABLE.CONSTRAINT_NAME = REFCON.UNIQUE_CONSTRAINT_NAME

    ORDER BY
    FKRefTable
  3. While there are unprocessed tables, traverse the relationship tree of each table until I reach a table (or node) that has no dependencies and add it to the sorted list
    DECLARE @CurrentTable varchar(100)
    DECLARE @DependantTable varchar(100)
    DECLARE @PreviousRelatedTable varchar(100)

    SET @PreviousRelatedTable = ''
    SET @CurrentTable = (SELECT TOP 1 TableName FROM #Tables)

    WHILE @CurrentTable != ''
    BEGIN

    SET
    @DependantTable = ''

    SELECT TOP 1
    @DependantTable = PKTable
    FROM
    #
    Dependencies
    WHERE
    FKRefTable = @CurrentTable AND
    PKTable != @CurrentTable
    /* ----------------------------------------------------
    If this table does not have any dependencies on other tables
    then add it to the sorted tables table and then remove it from
    the tables list.
    ----------------------------------------------------- */
    IF @DependantTable = ''
    BEGIN

    INSERT INTO #
    SortedTables (
    TableName
    )
    VALUES (
    @CurrentTable
    )

    --Remove all references to the removed table to allow
    --object higher up in the hierarchy to be added later
    DELETE FROM #Dependencies
    WHERE
    PKTable = @CurrentTable

    --Remove the current table from the tables list
    DELETE FROM #Tables
    WHERE
    TableName = @CurrentTable

    --Move on to the next table
    SET @CurrentTable = ''
    SET @CurrentTable = (SELECT TOP 1 TableName FROM #Tables)
    END

    /* --------------------------------------------------------
    If a dependant table was found then move on to that
    table for the next loop
    -------------------------------------------------------- */
    ELSE
    BEGIN
    SET
    @PreviousRelatedTable = @CurrentTable
    SET @CurrentTable = @DependantTable
    END
    END

The end result in this case is the following script:

DELETE FROM [UserTeam]
DELETE FROM [Team]
DELETE FROM [Document]
DELETE FROM [User]

The complete script is available here Clear DB - Topological Sort Example.zip.

Obviously the same approach without the list reversal can be used to determine the order in which to script out tables and data.

posted on Thursday, January 04, 2007 7:09 AM
Comments have been closed on this topic.