Database Administrators Asked by Kilazur on October 27, 2021
Here’s my query:
DECLARE @monthStartDate date;
DECLARE @monthEndDate date;
SET @monthStartDate = DATEFROMPARTS(2020,6,1);
SET @monthEndDate = EOMONTH(@monthStartDate);
SELECT @monthStartDate, MIN([Date]), @monthEndDate, GroupId, TypeId,
COUNT(UserID) as 'CountTotal',
SUM(CASE WHEN [IsA] = 1 THEN 1 ELSE 0 END) as 'CountA',
SUM(CASE WHEN [IsB] = 1 THEN 1 ELSE 0 END) as 'CountB',
SUM(CASE WHEN [IsC] = 1 THEN 1 ELSE 0 END) as 'CountC',
SUM(CASE WHEN [IsEven] = 1 THEN 1 ELSE 0 END) as 'CountEven',
SUM(CASE WHEN [IsOdd] = 1 THEN 1 ELSE 0 END) as 'CountOdd',
SUM(CASE WHEN [IsEven] = 1 AND [IsA] = 1 THEN 1 ELSE 0 END) as 'CountAEven',
SUM(CASE WHEN [IsOdd] = 1 AND [IsA] = 1 THEN 1 ELSE 0 END) as 'CountAOdd',
SUM(CASE WHEN [IsEven] = 1 AND [IsC] = 1 THEN 1 ELSE 0 END) as 'CountCEven',
SUM(CASE WHEN [IsOdd] = 1 AND [IsC] = 1 THEN 1 ELSE 0 END) as 'CountCOdd',
SUM(CASE WHEN [IsEven] = 1 AND [IsB] = 1 THEN 1 ELSE 0 END) as 'CountBEven',
SUM(CASE WHEN [IsOdd] = 1 AND [IsB] = 1 THEN 1 ELSE 0 END) as 'CountBOdd'
FROM MyTable
WHERE [Date] >= @monthStartDate AND [Date] <= @monthEndDate
GROUP BY [Date], GroupId, TypeId
ORDER BY Date, GroupId, TypeId
which selects from this table:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [MyTable](
[UserID] [varchar](32) NOT NULL,
[Date] [date] NOT NULL,
[GroupId] [varchar](16) NOT NULL,
[IsEven] [bit] NOT NULL,
[IsOdd] [bit] NOT NULL,
[TypeId] [int] NOT NULL,
[IsA] [bit] NOT NULL,
[IsB] [bit] NOT NULL,
[IsC] [bit] NOT NULL,
CONSTRAINT [PK_Http_Uniques] PRIMARY KEY CLUSTERED
(
[Date] DESC,
[GroupId] ASC,
[UserID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [MyTable] WITH CHECK ADD CONSTRAINT [CK_MyTable_Period] CHECK (([IsIn]<>(0) OR [IsOut]<>(0)))
GO
ALTER TABLE [MyTable] CHECK CONSTRAINT [CK_MyTable_Period]
GO
It works, but the table is going to have millions of rows eventually, and I’m obviously cautious about performances.
Do you see a way to optimize this query? Thanks.
Indexing on these fields would be key: [Date], GroupId, TypeId
However, whatever date ranges are "typical" for your query, I would set up partitions for that size. Ie, if you are typically querying for a month of data and you set up a partition by month, you will only look at a single parittion (ie, the rows for that month), no matter how much data you have. The optimizer will choose the partition if it's partitioned on the same field. 10 rows or 100M, you will see hte same/similar performance.
Answered by Doug B on October 27, 2021
If you're attempting to optimize aggregation queries inside SQL Server, the single best tool you can put to work is Columnstore indexes. They are specifically built to handle aggregation of exactly this type of data. This is especially true when we're talking millions of rows of data or more.
The original question doesn't say what version of SQL Server you're running, but if you're on 2016 or greater, you can realize the best possible use of your storage and your indexing. If the majority of the queries against your table are analytical in nature, you can store the table as a clustered columnstore. Then, you can add nonclustered indexes in support of point lookups. If, on the other hand, you're still seeing primarily OLTP queries, with only the occasional analytical query, you can add a nonclustered columnstore to a table stored as a clustered index. You can read more about picking the right columnstore index here.
Because the initial implementation of columnstore indexes was less than ideal, they were read only at first, many people have dismissed them from their tool box. However, you're looking at exactly the kind of query they're meant to solve. Test it. Guaranteed you'll see a massive improvement in query performance, even with the use of local variables (which, by the way, was already pointed out to you, can, sometimes, cause performance issues because of how they affect row estimates, in many cases, a parameter value or a hard-coded value may perform better because you'll get a better plan due to better row estimates).
Answered by Grant Fritchey on October 27, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP