Database Administrators Asked on January 2, 2022
The situation: I use Service Broker to develop data push approach.
Now I consider a scenario:
Scripts for Service Broker installation in Broker database:
-- installation
use master
go
if exists ( select * from sys.databases where name = 'Broker' )
begin
alter database [Broker] set restricted_user with rollback immediate;
drop database [Broker];
end
go
create database [Broker]
go
alter database [Broker] set enable_broker with rollback immediate;
alter database [Broker] set read_committed_snapshot on;
alter database [Broker] set allow_snapshot_isolation on;
alter database [Broker] set recovery full;
go
use [Broker]
go
create message type datachanges_messagetype
validation = none;
go
create contract datachanges_contract ( datachanges_messagetype sent by initiator );
go
create queue dbo.datachanges_initiatorqueue
with status = on
, retention = off
, poison_message_handling ( status = on )
on [default];
go
create queue dbo.datachanges_targetqueue
with status = on
, retention = off
, poison_message_handling ( status = on )
on [default];
go
create service datachanges_initiatorservice
on queue datachanges_initiatorqueue
( datachanges_contract );
go
create service datachanges_targetservice
on queue datachanges_targetqueue
( datachanges_contract );
go
-- conversation additional table
create table dbo.[SessionConversationsSPID] (
spid int not null
, handle uniqueidentifier not null
, primary key ( spid )
, unique ( handle )
)
go
-- SP which is used to send data from triggers
create procedure dbo.trackChanges_send
@json nvarchar(max)
as
begin
set nocount on;
if ( @json is null or @json = '' )
begin
raiserror( 'DWH Service Broker: An attempt to send empty message occurred', 16, 1);
return;
end
declare @handle uniqueidentifier = null
, @counter int = 1
, @error int;
begin transaction
while ( 1 = 1 )
begin
select @handle = handle
from dbo.[SessionConversationsSPID]
where spid = @@SPID;
if @handle is null
begin
begin dialog conversation @handle
from service datachanges_initiatorservice
to service 'datachanges_targetservice'
on contract datachanges_contract
with encryption = off;
insert into dbo.[SessionConversationsSPID] ( spid, handle )
values ( @@SPID, @handle );
end;
send on conversation @handle
message type datachanges_messagetype( @json );
set @error = @@error;
if @error = 0
break;
set @counter += 1;
if @counter > 5
begin
declare @mes varchar(max) = 'db - ' + @db + '. schema - ' + @sch;
raiserror( N'DWH Service Broker: Failed to SEND on a conversation for more than 10 times. Source: %s. Error: %i.', 16, 2, @mes, @error );
break;
end
delete from dbo.[SessionConversationsSPID]
where handle = @handle;
set @handle = null;
end
commit;
end
go
-- And dialogs creation to mitigate hot spot problem on sys.sysdesend table.
-- Described here: https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2008/dd576261
declare @i int, @spid int, @handle uniqueidentifier
select @i = 0, @spid = 50;
while (@i < 150*3000) -- 450000 dialogs
begin
set @i = @i + 1
begin dialog @handle
from service datachanges_initiatorservice
to service 'datachanges_targetservice'
on contract datachanges_contract
with encryption = off;
if ((@i % 150) = 0)
begin
set @spid += 1;
insert into dbo.SessionConversationsSPID ( spid, handle ) values (@spid, @handle)
end
end
Typical trigger code in an user database:
create trigger [<SCHEMA>].[<TABLE>_TR_I]
on [<SCHEMA>].[<TABLE>]
with execute as caller
after insert
as
begin
set xact_abort off;
set nocount on;
declare @rc int = ( select count(*) from inserted );
if ( @rc = 0 )
begin
return;
end
begin try
declare @db_name sysname = db_name();
declare @json nvarchar(max);
set @json = (
select getutcdate() as get_date, ''I'' as tr_operation, current_transaction_id() as cur_tran_id, ''<TABLE>'' as table_name, @@servername as server_name, @db_name as db_name, ''<SCHEMA>'' as tenant_schemaname
, *
from inserted
for json auto, include_null_values
);
exec dbo.trackChanges_send
@json = @json;
end try
begin catch
declare @error_message nvarchar(max);
set @error_message = ''['' + isnull( cast( error_number() as nvarchar( max ) ), '''' ) +''] ''
+ isnull( cast( error_severity() as nvarchar( max ) ), '''' )
+'' State: ''+ isnull( cast( error_state() as nvarchar( max ) ), '''' )
+'' Trigger: '' + ''[<SCHEMA>].[<TABLE>_TR_I]''
+'' Line: '' + isnull( cast( error_line() as nvarchar( max ) ), '''' )
+'' Msg: '' + isnull( cast( error_message() as nvarchar( max ) ), '''' );
raiserror( ''DWH Service Broker: An error has been occured while sending data changes. Error: %s'', 0, 0, @error_message ) with log;
return;
end catch
end
go
So, my questions are:
Using dbcc page I see that this page belongs to sys.queue_messages_597577167 which is a wrapper for dbo.datachanges_targetqueue. Total amount of waiting sessions on that moment was ~ 450, so it can be a bottleneck.
Meanwhile xEvent session which tracks trigger execution tells
that in that time triggers were executed pretty long ( more than 10 sec, when usually it’s less then 1 sec ).
It happens in random time, so I don’t see any dependency here
declare @i int = 0;
while ( 1 = 1 )
begin
declare @mb varbinary( max );
receive top ( 1000 ) @mb = message_body from dbo.datachanges_targetqueue
set @i = @@rowcount;
if @i = 0
break;
end
which is executed periodically can be blocked also because of triggers activity. And I do not understand why.
IS it ok to use one queue and ~800000 triggers? 🙂 I mean maybe there are some thresholds I need to consider.
What are the advantages/disadvantages to use "my" approach ( one db is a sender and a target ) or to use "every db is sender and one target"
IS it ok to use one queue and ~800000 triggers? :) I mean maybe there are some thresholds I need to consider.
No. Not really. You must ensure that your triggers are always short-running or your throughput will suffer.
Having 800000 triggers write to a single queue is not going to be a good idea. A queue is backed by a regular table, and at some scale page hotspots are going to be your bottleneck. And:
Messages sent to services in the same instance of the Database Engine are put directly into the queues associated with these services.
If your target service is on a remote SQL Server instance then the messages will be written and committed to to each database's transmission queue. But for target queues on the same instance messages go directly to the target queue.
I think the bottom line is that writing directly to the target queue is not the right solution here. Imagine having an empty target queue at the time of peak transaction throughput. That queue's backing table simply doesn't have enough pages to spread out the page latching to accommodate a large number of concurrent writers needed in this scenario.
And if all your tables are in the same database, then the transmission queue could become the bottleneck. But the transmission queue has a different structure than normal queues. The transmission queue has a single clustered index:
select i.name index_name, i.type, c.name, c.column_id, t.name type_name, c.max_length, ic.key_ordinal
from
sys.indexes i
join sys.index_columns ic
on ic.object_id = i.object_id
join sys.columns c
on c.object_id = ic.object_id
and c.column_id = ic.column_id
join sys.types t
on t.system_type_id = c.system_type_id
and t.user_type_id =c.user_type_id
where c.object_id = object_id('sys.sysxmitqueue')
outpts
index_name type name column_id type_name max_length key_ordinal
----------- ---- ------------ ----------- --------------------- ---------- -----------
clst 1 dlgid 1 uniqueidentifier 16 1
clst 1 finitiator 2 bit 1 2
clst 1 msgseqnum 8 bigint 8 3
So you won't have hot page contention on the transmission queue, and you'll have as many insertion points as you have dialog conversations (dlgid).
A normal queue has two indexes, a clustered index on
( status, conversation_group_id, priority, conversation_handle, queuing_order )
and a non-clustered index on
( status, priority, queuing_order, conversation_group_id, conversation_handle, service_id )
which you can see with this query
select q.name queue_name, i.name index_name, i.index_id, ic.index_id, i.type, c.name column_name, c.column_id, t.name type_name, c.max_length, ic.key_ordinal
from
SYS.SERVICE_QUEUES q
join sys.internal_tables it
ON it.parent_object_id = q.object_id
join sys.indexes i
on i.object_id = it.object_id
join sys.index_columns ic
on ic.object_id = i.object_id
and ic.index_id = i.index_id
join sys.columns c
on c.object_id = ic.object_id
and c.column_id = ic.column_id
join sys.types t
on t.system_type_id = c.system_type_id
and t.user_type_id =c.user_type_id
order by q.object_id, i.index_id, ic.key_ordinal
So you might be better off moving the target service to a remote SQL instance. This would offload and writing and reading of the target queues, and might have less of a bottleneck. Your triggers would only have to put the message on the transmission queue, which is what you thought was happening in the first place.
You can watch the routing and transmission queue usage with an Extended Events session like:
CREATE EVENT SESSION [ServiceBrokerRouting] ON SERVER
ADD EVENT sqlserver.broker_dialog_transmission_body_dequeue,
ADD EVENT sqlserver.broker_dialog_transmission_queue_enqueue,
ADD EVENT sqlserver.broker_forwarded_message_sent,
ADD EVENT sqlserver.broker_message_classify,
ADD EVENT sqlserver.broker_remote_message_acknowledgement
Also in your current design and in the remote service option, you can see from the index structures how how reusing the right number dialog conversations can optimize the solution. Too few and you have locking and page contention issues. Too many and you have overhead of creating and managing them, and you can't do message batching. It looks like you've already read Reusing Conversations, and are using a conversation-per-session pattern, which Remus recommends for this pattern. It would be interesting to see which index the page latch contention is on, and whether it's a leaf or non-leaf page. But in any case queue tables with concurrent SEND and RECEIVE don't usually have enough pages to spread out page latch contention.
So the design alternative is to have the triggers drop changes on N intermediate queues, and then have activation procs on those forward the messages to the single destination queue. You may still have waits on the destination queue, but they won't be during your triggers. Also in your intermediate-to-final queue activation procedure you can batch up sends and manage conversations and have many fewer dialog conversations (N), so the receiver can actually fetch 1000 messages per call. A single call to RECEIVE can only fetch messages from a single conversation. So if you have thousands of conversations interleaved, you'll always only fetch single rows.
Or simply have N destination queues and have your readers read from all of them.
There's no fundamental reason why you can't get this working, but it's not going to be simple. The scale is immense, and Service Broker is complex. You should also consider 3rd party solutions here. Qlik (Attunity) has a log-based CDC solution for SQL Server that can harvest all the changes from the SQL Server transaction logs without triggers or queues. And there are several other solutions based on SQL Server Change Data Capture. Change Data Capture will stage all the changes for you, and you just have to consume them from your external program. Change Tracking is the most lightweight solution, but doesn't capture the intermediate versions of rows. So you know which rows changed and whether the change was an insert, update, or delete, but you only have the current version of the row to query. But every one of these options is going to be expensive, tricky, and require significant testing.
Answered by David Browne - Microsoft on January 2, 2022
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP