Pythian Blog: Technical Track

SQL Server: More light shed on "non-yielding scheduler" and Indexes stats

SQL server folks may find the "non-yielding scheduler" warning familiar and it is usually (if not always) associated with a stack dump. The "non-yielding scheduler" message comes with warnings like the following:
Process %ld:%ld:%ld (0x%lx) Worker 0x%p appears to be non-yielding on Scheduler %ld. Thread creation time: %time. Approx Thread CPU Used: kernel 0x ms, user 0x ms. Process Utilization 0x%. System Idle 0x%. Interval: 0x ms.
This means that there is a background thread that keeps running and making other worker threads yielding back, in time. There are many causes of the "non-yielding scheduler" error. Memory dump analysis is necessary to determine the cause. Look up the error in MS Support Advanced Search and you will find plenty of KBs dedicated to it. You may also find (not all cases) errors 17883, 17884 logged in the event viewer, containing information like this:
Event Type: Information Event Source: MSSQLSERVER Event Category: (2) Event ID: 17884 Date: Time: User: Computer: Description: New queries assigned to process on Node 0x have not been picked up by a worker thread in the last 0x seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the "max worker threads" configuration option to increase number of allowable threads, or optimize current running queries. SQL Process Utilization: 0x%. System Idle: ox%%. Event Type: Information Event Source: MSSQLSERVER Event Category: (2) Event ID: 17883 Date: Time: User: Computer: Description: Process Worker appears to be non-yielding on Scheduler . Thread creation time: . Approx Thread CPU Used: kernel ms, user ms. Process Utilization %. System Idle %. Interval: ms.
There is a must-read whitepaper describing SQL Server Scheduling and how to interpret and diagnose Errors 17883, 17884, 17887, and 17888; please look here How To Diagnose and Correct Errors 17883, 17884, 17887, and 17888 Recently, we had a client having same issue. The client runs a busy online business with thousands of connected users; databases sometimes include more than half a million tables. We had a healthy counters and server didn't seem to suffer from memory or I/O bottlenecks. We did suspect, however, that it is something to do with SQL Server internals, as we also received fatal exceptions pertaining to SPIDs <50 (system processes). We looked at any Hardware Fault, outdated drivers, tempdb contention ( there were nearly 0.5 datafiles/core). We opened a case with MS support after the dumps continued to happen and their reply was:
After analyzing the dumps, we see that the thread has non-yielded while updating index usage stats. These stats are returned through DMV sys.dm_db_index_usage_stats.
According to MS, the issue is a known bug that has been corrected in SQL Server 2008 but isn't expected to be resolved in SQL Server 2005.

Workaround

Enable trace flag 2330 to prevent the update of data in DMV in sys.dm_db_index_usage_stats. This DMV retrieves information that about index usage; it does not, however, generate this information itself, but rather just returns this info from a cache inside SQL Server. This cache is empty when the server instance beside it doesn't persist between instance restarts. The impact of disabling this job was nothing to the server operations. There was a selective indexes maintenance job that depended completely on sys.dm_db_index_physical_stats, which is a different DMV. Also, the query optimizer should be only affected if we stopped updating statistics, or are no longer optimizing the indexes which we were not doing here. The affected DMV helps with auditing the indexes usage when we opt to diagnose instance performance, as it helps tracking the number of scan operations, the number of seek operations, the number of lookup operations, and some more. It is worthwhile to note that sys.dm_db_index_operational_stats can report similar statistics (there are some differences though) as sys.dm_db_index_physical_stats. Craig Freedman wrote a good blog: What is the difference between sys.dm_db_index_usage_stats and sys.dm_db_index_operational_stats? Two days ago---and that's the reason I'm writing this blog---Microsoft released a "FAST PUBLISH" KB: ( Query performance decreases when sys.dm_db_index_usage_stats has large number of rows) about this issue and it mentions the following scenario:
  • In Microsoft SQL Server 2005, you frequently perform DDL operations that involve dropping and recreating of lot of tables (especially temporary tables in tempdb database).
  • You have a large number of entries (100,000 or more) in the sys.dm_db_index_usage_stats dynamic management view (DMV).
This aligns with what we had with the affected server (more than half a million tables can easily generate that). The KB mentions these workarounds:
  1. Reduce the rate at which you create or drop temporary tables. (This may need extensive code change.)
  2. Upgrade to SQL Server 2008. (Unfortunately, not everyone can afford that.)
  3. Configure your SQL Server to use trace flag T2330 as a startup parameter. (This is what we have done and adieu la problème.)
Note: make sure that the trace is enabled by running DBCC TRACESTATUS(-1) You can also look in the SQL Server error log and note whether the trace flags are mentioned, something like this: Time Server -T1222 Time Server -T845 Time Server -T2330 One more KB that was released recently: ( FIX: Error message in the SQL Server 2005 Errorlog file after the SQL Server service stops responding: "Timeout occurred while waiting for latch"); mentions similar symptoms that are related to SQL Server processing missing index information for a dynamic management view (DMV). This was fixed in SQL server 2005 Sp3 CU6 If you face similar issues, you should install the latest service pack and cumulative update, and check if you have any resource bottlenecks. You can try trace T2330 as well, and if you still get errors or dumps, then you had better open a case with Microsoft Support to analyze the dumps and provide a resolution. Hope this helps!

No Comments Yet

Let us know what you think

Subscribe by email