Why is index not being used? Can I replicate plan?

Question

I have been tuning a query/ indexes in a dev environment, but cannot replicate the new query plan when applying the same changes to a UAT environment.
Specifically, in the UAT environment the optimizer chooses to ignore a particular index and instead performs a nonclustered index seek on an existing unique constraint, followed by a key lookup on the clustered index.
An abridged version of the query is:

select  d.dim_date_id 
       ,   f.dim_form_id
       ,   d.date

INTO #TMP
from    DWH.dbo.tbl_fact_outcome f
join    DWH.dbo.tbl_dim_date d      ON d.date >= DATEADD(DAY,-1,f.known_from) and d.date < f.known_to
join    DWH.dbo.tbl_dim_form df     ON  f.dim_form_id = df.dim_form_id
join    DWH.dbo.tbl_dim_question Q  ON f.dim_question_id = Q.dim_question_id

where   (d.flag_latest_day = 'Y' or d.flag_end_of_month = 'Y'  or (d.flag_end_of_week = 'Y' AND d.flag_latest_week = 'Y'))
       and d.flag_future_day = 'N'
       and df.flag_latest_form = 'Y'
       and f.deleted = 0           
       and q.question_key like 'R%'

and d.date >= '14/07/2020'

The index in question is:
CREATE NONCLUSTERED INDEX [ix_tbl_dim_date_flag_future_day_includes] ON [dbo].[tbl_dim_date]
([flag_future_day] ASC)
INCLUDE([dim_date_id],[date],[flag_end_of_month],[flag_latest_day],[flag_end_of_week],[flag_latest_week])

It's not a large table and should only return one or two dates which filter results from subsequent joins accordingly. Adding other fields to key columns in index made no difference in the dev environment - the only improvement came from the order/ shape of the query plan.

The dev query plan (good) is here : https://www.brentozar.com/pastetheplan/?id=SyZjIGA1v
The UAT query plan (bad) is here : https://www.brentozar.com/pastetheplan/?id=BkBBLzCyv

Both environments have the exact same table structure and indexes and are using the same query.
The dev environment does have around half the volume of data overall, but even if I amend the date range in either to return a similar number of records the query plans in both environments stay as they are.
There was a dramatic improvement on performance in the dev environment since this query plan has been used and I am fairly confident the same would be seen in my UAT environment.
I have tried using query hints to force the index and/ or the order but haven't been able to replicate this plan.
Essentially I have two questions:

Why would the optimizer choose a key lookup over this index?
Is there anything I can do to force it to follow the shape seen in the dev environment?

I am using SQL Server 2014 enterprise edition.

EDIT - 20/07/2020 - Actual execution plans added
Actual good execution plan from dev environment - running this took 37 seconds:
https://www.brentozar.com/pastetheplan/?id=BJkr-Q7gP
Actual bad execution plan from UAT - running this took 33m 42s: https://www.brentozar.com/pastetheplan/?id=H1nsxX7ew

Learning_DBAdmin · Answer

This is not a direct answer to your question however few observation which might put you in right direction to solve the issue. First few observations from actual plan:

Number of rows in table tbl_dim_form is different. In good plan , number of rows are 6653269 whereas in bad plan it is 9387471. Which is close to 40% more.
Same is the case with table tbl_fact_outcome, In good plan, number of rows are 28011736 whereas for bad plan, it is 65679017. This is more than double value.
I didn't check other tables however it seems volume of data is totally different between dev and UAT environment.

You may read more on this link how number of rows could affect query plan and other joining conditions.
I downloaded the XML for good and bad plan. If you use SentryOne Plan explorer, below is the one which is highlighted in bad plan:

Number of rows read is in billions and it is 3 times than that of actually required.

In the case of good plan, I/O residual is much lesser.
So, you need to solve this issue first and also point raised by David Browne on table tbl_fact_outcome. Below are some of the pages where it could help you:
https://sqlperformance.com/2016/06/sql-indexes/actual-rows-read-warnings-plan-explorer
"Warnings: Operation caused residual I/O" versus key lookups
https://www.sqlshack.com/the-impact-of-residual-predicates-in-a-sql-server-index-seek-operation/
I hope this helps you.

John · Answer

This was totally a workaround, but it worked for me.  I had 4 servers running the same database with different data sets.  Three of them would chose an optimal plan, but the fourth would never choose that plan, and was consistently taking significantly longer.  In my case I could see the good plans seeking one of the tables for a handful of rows out of around 100,000 first, then doing the other joins.  The bad plan would do that seek last and would seek that table 500,000 plus times.  I also tried the query hints and re-writing the query in different ways with no luck.
So my work around was to manually query that table for those handful of rows first into a temp table, then join to that instead of the full table.
So in your case could you run something like this first into a temp table and then join against it instead of the whole DWH.dbo.tbl_dim_date table:
select  d.dim_date_id 
   ,d.date
   ,d.flag_future_day 
from    DWH.dbo.tbl_dim_date d 
where   (d.flag_latest_day = 'Y' or d.flag_end_of_month = 'Y'  
          or (d.flag_end_of_week = 'Y' AND d.flag_latest_week = 'Y'))
   and d.flag_future_day = 'N'
   and d.date >= '14/07/2020'

Not sure in your case if the above query would be selective enough to be really helpful and I realize that it is a workaround. But sometimes workarounds work!
Another thought, are you able to modify that NCI you included? It might help if you added the date column to the indexed columns instead of the INCLUDE columns. There are a lot of filters done in that query based on that date column.

Why is index not being used? Can I replicate plan?

2 Answers

Add your own answers!

Ask a Question