MSSQL TempDb full

Question

I have a problem with my MSSQL cluster. When switching to SQL Server 2016 (from 2014) an error occurs with the tempdb, which is full after a certain time for no apparent reason.
Is there a proven way to monitor or avoid this behavior?
Because every time the tempdb is full, the entire server does not work and rejects every query.
The only solution that works when the error occurs is to increase the tempdb log and temdb dev files.
Currently the tempdb size for a 700 GB database is approximately 90 GB.
Thanks for everyone who can help!

David Spillett · Answer

Is there a proven way to monitor or avoid this behavior?

TempDB is a database like any other in many respects, so monitoring tools that can watch metrics like space allocated & used and raise alerts based on those metrics can be used to monitor TempDB too. As a quick proof of that run
USE tempdb
EXEC sp_spaceused

in SSMS.

Currently the tempdb size for a 700 GB database is approximately 90 GB.

If TempDB is constantly growing like this and the space remains used (not just allocated) which it sounds like it is temporarily as you need to increase the allocation in order to move on, then it sounds like you have long-running, possibly blocked, sessions that are holding a lot of data in TempDB. You could use diagnostic tools like sp_whoisactive (run in the app database, not tempdb) at such times to list active sessions at the time you have a problem. From sp_whoisactive's documentation:

The [tempdb_allocations] column is collected directly from the tempdb-related DMVs, and indicates how many pages were allocated in tempdb due to temporary tables, LOB types, spools, or other consumers. The [tempdb_current] column is computed by subtracting the deallocated pages information reported by the tempdb DMVs from the number of allocations. Seeing a high number of allocations with a small amount of current pages means that your query may be slamming tempdb, but is not causing it to grow. Seeing a large number of current pages means that your query may be responsible for all of those auto-grows you keep noticing.

If you spot a problem session (or several problem sessions) the sql_text and sql_command outputs from it run with default options and with @get_full_inner_text=1, @get_outer_command=1 may help identify the processes that are either running away in loops or getting snarled up in lock waits that you need to address, or stopping and not closing (due to an application problem, or a human leaving a connection open in a tool like SSMS).
There are many things that could cause sessions to balloon TempDB this way (perhaps a bug where you have an accidental CROSS JOIN that is spooling to TempDB, the size of that dump getting larger each time as your data grows, to pick one example - I've seen that before where someone has "fixed" it by adding DISTINCT instead of actually addressing the errant/missing join predicates). It may also be that rather than a few errant sessions you have a glut of concurrent ones that use TempDB a lot and rather than being blocked by locks they are simple taking a lot of time to complete due to IO contention - if that is the case then in the run-up to things stalling because TempDB becomes full you will see your drives or network being pummelled.
Once the space in TempDB is released these diagnostics will be less useful so you'll need to look at other things, but from your description it sounds like the space is still allocated so it is trying to grow as it needs more.

Stephen Morris - Mo64 · Answer

I think the definitive guide to troubleshooting tempdb out of space errors is by Sunil Agarwal who is a Program Manager at Microsoft on the SQL team :-
https://docs.microsoft.com/en-us/archive/blogs/sqlserverstorageengine/tempdb-monitoring-and-troubleshooting-out-of-space

MSSQL TempDb full

2 Answers

Add your own answers!

Ask a Question