TransWikia.com

Optimizing my query

Database Administrators Asked by Kilazur on October 27, 2021

Here’s my query:

DECLARE @monthStartDate date;
DECLARE @monthEndDate date;
    
SET @monthStartDate = DATEFROMPARTS(2020,6,1);
SET @monthEndDate = EOMONTH(@monthStartDate);
     
SELECT @monthStartDate, MIN([Date]), @monthEndDate, GroupId, TypeId,
    COUNT(UserID) as 'CountTotal',
    SUM(CASE WHEN [IsA] = 1 THEN 1 ELSE 0 END) as 'CountA',
    SUM(CASE WHEN [IsB] = 1 THEN 1 ELSE 0 END) as 'CountB',
    SUM(CASE WHEN [IsC] = 1 THEN 1 ELSE 0 END) as 'CountC',
    SUM(CASE WHEN [IsEven] = 1 THEN 1 ELSE 0 END) as 'CountEven',
    SUM(CASE WHEN [IsOdd] = 1 THEN 1 ELSE 0 END) as 'CountOdd',
    SUM(CASE WHEN [IsEven] = 1 AND [IsA] = 1 THEN 1 ELSE 0 END) as 'CountAEven',
    SUM(CASE WHEN [IsOdd] = 1 AND [IsA] = 1 THEN 1 ELSE 0 END) as 'CountAOdd',
    SUM(CASE WHEN [IsEven] = 1 AND [IsC] = 1 THEN 1 ELSE 0 END) as 'CountCEven',
    SUM(CASE WHEN [IsOdd] = 1 AND [IsC] = 1 THEN 1 ELSE 0 END) as 'CountCOdd',
    SUM(CASE WHEN [IsEven] = 1 AND [IsB] = 1 THEN 1 ELSE 0 END) as 'CountBEven',
    SUM(CASE WHEN [IsOdd] = 1 AND [IsB] = 1 THEN 1 ELSE 0 END) as 'CountBOdd'
FROM MyTable
WHERE [Date] >= @monthStartDate AND [Date] <= @monthEndDate
GROUP BY [Date], GroupId, TypeId
ORDER BY Date, GroupId, TypeId

which selects from this table:

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [MyTable](
    [UserID] [varchar](32) NOT NULL,
    [Date] [date] NOT NULL,
    [GroupId] [varchar](16) NOT NULL,
    [IsEven] [bit] NOT NULL,
    [IsOdd] [bit] NOT NULL,
    [TypeId] [int] NOT NULL,
    [IsA] [bit] NOT NULL,
    [IsB] [bit] NOT NULL,
    [IsC] [bit] NOT NULL,
 CONSTRAINT [PK_Http_Uniques] PRIMARY KEY CLUSTERED 
(
    [Date] DESC,
    [GroupId] ASC,
    [UserID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

ALTER TABLE [MyTable]  WITH CHECK ADD  CONSTRAINT [CK_MyTable_Period] CHECK  (([IsIn]<>(0) OR [IsOut]<>(0)))
GO

ALTER TABLE [MyTable] CHECK CONSTRAINT [CK_MyTable_Period]
GO

It works, but the table is going to have millions of rows eventually, and I’m obviously cautious about performances.

Do you see a way to optimize this query? Thanks.

2 Answers

Indexing on these fields would be key: [Date], GroupId, TypeId

However, whatever date ranges are "typical" for your query, I would set up partitions for that size. Ie, if you are typically querying for a month of data and you set up a partition by month, you will only look at a single parittion (ie, the rows for that month), no matter how much data you have. The optimizer will choose the partition if it's partitioned on the same field. 10 rows or 100M, you will see hte same/similar performance.

Answered by Doug B on October 27, 2021

If you're attempting to optimize aggregation queries inside SQL Server, the single best tool you can put to work is Columnstore indexes. They are specifically built to handle aggregation of exactly this type of data. This is especially true when we're talking millions of rows of data or more.

The original question doesn't say what version of SQL Server you're running, but if you're on 2016 or greater, you can realize the best possible use of your storage and your indexing. If the majority of the queries against your table are analytical in nature, you can store the table as a clustered columnstore. Then, you can add nonclustered indexes in support of point lookups. If, on the other hand, you're still seeing primarily OLTP queries, with only the occasional analytical query, you can add a nonclustered columnstore to a table stored as a clustered index. You can read more about picking the right columnstore index here.

Because the initial implementation of columnstore indexes was less than ideal, they were read only at first, many people have dismissed them from their tool box. However, you're looking at exactly the kind of query they're meant to solve. Test it. Guaranteed you'll see a massive improvement in query performance, even with the use of local variables (which, by the way, was already pointed out to you, can, sometimes, cause performance issues because of how they affect row estimates, in many cases, a parameter value or a hard-coded value may perform better because you'll get a better plan due to better row estimates).

Answered by Grant Fritchey on October 27, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP