Pythian Blog: Technical Track

Distribution Cleanup job and xp_cmdshell

Recently I encountered a SQL Server 2005 Replication setup where the following error message was appearing about every 5 minutes in the SQL Server error logs.

Replication-@rowcount_only parameter must be the value 0,1, or 2. 0=7.0 compatible checksum. 1=only check rowcou: agent distribution@rowcount_only parameter must be the value 0,1, or 2. 0=7.0 compatible checksum. 1=only  scheduled for retry. Could not clean up the distribution transaction tables.

The error message was coming from the failing distribution cleanup job. Examining the job history I was able to determine the real error message.

Executed as user: PROD\SQLAgent. Could not remove directory '\\dummy\Replication\unc\DUMMY_TRANSLOG_TRANSLOG\20080205230108\'. Check the security context of xp_cmdshell and close other processes that may be accessing the directory. [SQLSTATE 42000] (Error 20015). The step failed.

To provide some background information on the Distribution Cleanup Job. When a server is configured to be a Distributer a job is created titled ‘Distribution clean up: distribution’. The job has one step and it calls a stored proc called dbo.sp_MSdistribution_cleanup where it will clear out records from the MSrepl_commands and MSrepl_transactions tables. The cleanup procedure also deletes any expired snapshots stored in the Replication folder.

The first step in debugging was to confirm if xp_cmdshell was enabled. I opened SQL Server Surface Area Configuration and selected “Surface Area Configuration for Features”, selected the xp_cmdshell and checked the enable box. Xp_cmdshell is disabled by default in SQL Server 2005 for security reasons.

The second part of debugging was to make sure that the PROD\SQLAgent account had proper permissions to run xp_cmdshell statements. Only members of the sysadmin role has permission to create jobs with steps that run xp_cmdshell, if PROD\SQLAgent is not a member of sysadmin role, the job could be given permission to run xp_cmdshell by using a proxy account.

The final part of debugging was to ensure that proper permissions were set to delete the specified folder. The stored procedure dbo.sp_MSdistribution uses the unc path, not the local path to the folder. I configured the SQL Server service account to have full control on the replication share. There is a distinction between the permissions set on the folder and the network share.

The reason I set the SQL Server service account instead of PROD\SQLAgent to have full control on the share is because when an xp_cmdshell command it runs on the os as the SQL Server service account not the user issuing the command. To prove this I have run the following commands

C:\>echo %USERNAME%
pythian

C:\>osql -E
1> xp_cmdshell 'echo %USERNAME% > c:\whoami.txt'
2> go
 output

 -------------------------------------------------------------------------------

        ------------------------------------------------------------------------

        ------------------------------------------------------------------------

        --------------------------------
 NULL

(1 row affected)
1> exit

C:\>more whoami.txt
NETWORK SERVICE

As we can see in the previous test that the xp_cmdshell is run under the NETWORK SERVICE account instead of the pythian user which I was logged in as.

After making these changes the distribution clean up job was able to run successfully and the errors stopped occurring in the Error logs.

No Comments Yet

Let us know what you think

Subscribe by email