AG Management

AG in this case means Availability Group, a feature called Always-On Availability Groups, introduced since SQL Server 2012.

Whilst the databases in the AG are kept in sync for changes, there is no built-in synchronisation method for logins, SQL Agent jobs etc.

Here is how I manage jobs…

All jobs are created on all the nodes in the AG and they all have the same schedule and they are all enabled.

Each job has a check to see if it is the “active node”.

If it is a single step job, then build this into each job on each node:

DECLARE @dbname NVARCHAR(128) = ‘<your AG db name goes here>’

IF sys.fn_hadr_is_primary_replica(@dbname) = 1

    BEGIN

        EXEC database.schema.SP

    END

ELSE

    RAISERROR (‘This is not the primary’, 0, 0)

It is very important that the job has a DB of master. This is because, when you script out the job and apply it to the other nodes with the replica database as the db to execute in, when the job runs, it will generate an error like “Cannot log onto server (local)”. This is because it is trying to start in a DB that is a replica DB, which we all know are offline for using. By performing this check, it will execute on the active node or it will exit gracefully. RAISERROR with a severity lower than 11 will not cause the job to “fail”. Therefore, make sure that the database name to execute the SP is included in the EXECUTE.

The second way to do this is with multi-step jobs like so;

The very first job step is this (setup in master) –

DECLARE @dbname NVARCHAR(128) ='<your AG db name goes here>’

IF sys.fn_hadr_is_primary_replica(@dbname) = 1

    BEGIN

        RAISERROR(‘This is the primary’, 0, 0)

    END

ELSE

    RAISERROR (‘This is not the primary’, 11, 0)

What is happening here is, if it is the primary node, the RAISERROR with the message “This is the Primary” is a low enough severity to have the job succeed and move to the next step – have your on success setting to go to the next step on success and on failure setting go to the last step which I will detail in a sec. If this is not the primary node, the job will fail but in an effort to make sure we don’t have a “job failure” when it really isn’t, we have to go to the last step in the job which is simply:

PRINT ‘This is not the primary database replica, exiting without error’

It is important that this step can only be accessed from the first step on failure therefore, make sure that your actual jobstep has quit on success and quit on failure so you can get alerted on actual job failures.

Not the use of the severity values in the RAISERROR statements

This entry was posted in Availability Groups. Bookmark the permalink.