Service Broker locks

Question

The situation: I use Service Broker to develop data push approach. Now I consider a scenario: Use separate DB ( called Broker ) which has services, queues and stored procedure to send data. Use triggers on necessary databases and tables which transform data to JSON and execute SP from Broker database to send data. I have 39 databases / 264632 tables / separated trigger in every table for I/U/D. 793896 triggers at all. Yes, I know that it's huge but we have such data model and I'm not able to change it. Now I don't use Activation Stored Procedure, because some client will consume data from SB target queue. Scripts for Service Broker installation in Broker database: -- installation use master go if exists ( select * from sys.databases where name = 'Broker' ) begin alter database [Broker] set restricted_user with rollback immediate; drop database [Broker]; end go create database [Broker] go alter database [Broker] set enable_broker with rollback immediate; alter database [Broker] set read_committed_snapshot on; alter database [Broker] set allow_snapshot_isolation on; alter database [Broker] set recovery full; go use [Broker] go create message type datachanges_messagetype validation = none; go create contract datachanges_contract ( datachanges_messagetype sent by initiator ); go create queue dbo.datachanges_initiatorqueue with status = on , retention = off , poison_message_handling ( status = on ) on [default]; go create queue dbo.datachanges_targetqueue with status = on , retention = off , poison_message_handling ( status = on ) on [default]; go create service datachanges_initiatorservice on queue datachanges_initiatorqueue ( datachanges_contract ); go create service datachanges_targetservice on queue datachanges_targetqueue ( datachanges_contract ); go -- conversation additional table create table dbo.[SessionConversationsSPID] ( spid int not null , handle uniqueidentifier not null , primary key ( spid ) , unique ( handle ) ) go -- SP which is used to send data from triggers create procedure dbo.trackChanges_send @json nvarchar(max) as begin set nocount on; if ( @json is null or @json = '' ) begin raiserror( 'DWH Service Broker: An attempt to send empty message occurred', 16, 1); return; end declare @handle uniqueidentifier = null , @counter int = 1 , @error int; begin transaction while ( 1 = 1 ) begin select @handle = handle from dbo.[SessionConversationsSPID] where spid = @@SPID; if @handle is null begin begin dialog conversation @handle from service datachanges_initiatorservice to service 'datachanges_targetservice' on contract datachanges_contract with encryption = off; insert into dbo.[SessionConversationsSPID] ( spid, handle ) values ( @@SPID, @handle ); end; send on conversation @handle message type datachanges_messagetype( @json ); set @error = @@error; if @error = 0 break; set @counter += 1; if @counter > 5 begin declare @mes varchar(max) = 'db - ' + @db + '. schema - ' + @sch; raiserror( N'DWH Service Broker: Failed to SEND on a conversation for more than 10 times. Source: %s. Error: %i.', 16, 2, @mes, @error ); break; end delete from dbo.[SessionConversationsSPID] where handle = @handle; set @handle = null; end commit; end go -- And dialogs creation to mitigate hot spot problem on sys.sysdesend table. -- Described here: https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2008/dd576261 declare @i int, @spid int, @handle uniqueidentifier select @i = 0, @spid = 50; while (@i < 150*3000) -- 450000 dialogs begin set @i = @i + 1 begin dialog @handle from service datachanges_initiatorservice to service 'datachanges_targetservice' on contract datachanges_contract with encryption = off; if ((@i % 150) = 0) begin set @spid += 1; insert into dbo.SessionConversationsSPID ( spid, handle ) values (@spid, @handle) end end Typical trigger code in an user database: create trigger [].[_TR_I] on [].[

] with execute as caller after insert as begin set xact_abort off; set nocount on; declare @rc int = ( select count(*) from inserted ); if ( @rc = 0 ) begin return; end begin try declare @db_name sysname = db_name(); declare @json nvarchar(max); set @json = ( select getutcdate() as get_date, ''I'' as tr_operation, current_transaction_id() as cur_tran_id, ''

'' as table_name, @@servername as server_name, @db_name as db_name, '''' as tenant_schemaname , * from inserted for json auto, include_null_values ); exec dbo.trackChanges_send @json = @json; end try begin catch declare @error_message nvarchar(max); set @error_message = ''['' + isnull( cast( error_number() as nvarchar( max ) ), '''' ) +''] '' + isnull( cast( error_severity() as nvarchar( max ) ), '''' ) +'' State: ''+ isnull( cast( error_state() as nvarchar( max ) ), '''' ) +'' Trigger: '' + ''[].[

_TR_I]'' +'' Line: '' + isnull( cast( error_line() as nvarchar( max ) ), '''' ) +'' Msg: '' + isnull( cast( error_message() as nvarchar( max ) ), '''' ); raiserror( ''DWH Service Broker: An error has been occured while sending data changes. Error: %s'', 0, 0, @error_message ) with log; return; end catch end go So, my questions are: Sometimes I see long PAGELATCH_EX / PAGELATCH_SH waits during trigger execution. The problem is that latches are waited on target_queue table. I don't understand why I see some kind of hot spot on target queue dbo.datachanges_targetqueue when I send data to Service Broker. I would understand if there was waiting related to sending system tables or transmission queue. I see target queue at resource_description column: Using dbcc page I see that this page belongs to sys.queue_messages_597577167 which is a wrapper for dbo.datachanges_targetqueue. Total amount of waiting sessions on that moment was ~ 450, so it can be a bottleneck. Meanwhile xEvent session which tracks trigger execution tells that in that time triggers were executed pretty long ( more than 10 sec, when usually it's less then 1 sec ). It happens in random time, so I don't see any dependency here My second problem is related to locking also. And also it happens in random time. My fake script to read data from target queue ( external client emulation ) is declare @i int = 0; while ( 1 = 1 ) begin declare @mb varbinary( max ); receive top ( 1000 ) @mb = message_body from dbo.datachanges_targetqueue set @i = @@rowcount; if @i = 0 break; end which is executed periodically can be blocked also because of triggers activity. And I do not understand why. IS it ok to use one queue and ~800000 triggers? :) I mean maybe there are some thresholds I need to consider. What are the advantages/disadvantages to use "my" approach ( one db is a sender and a target ) or to use "every db is sender and one target"

David Browne - Microsoft · Answer

IS it ok to use one queue and ~800000 triggers? :) I mean maybe there are some thresholds I need to consider.

No.  Not really.  You must ensure that your triggers are always short-running or your throughput will suffer.
Having 800000 triggers write to a single queue is not going to be a good idea.  A queue is backed by a regular table, and at some scale page hotspots are going to be your bottleneck.  And:

Messages sent to services in the same instance of the Database Engine
are put directly into the queues associated with these services.

SEND
If your target service is on a remote SQL Server instance then the messages will be written and committed to to each database's transmission queue.  But for target queues on the same instance messages go directly to the target queue.
I think the bottom line is that writing directly to the target queue is not the right solution here.  Imagine having an empty target queue at the time of peak transaction throughput.  That queue's backing table simply doesn't have enough pages to spread out the page latching to accommodate a large number of concurrent writers needed in this scenario.
And if all your tables are in the same database, then the transmission queue could become the bottleneck.  But the transmission queue has a different structure than normal queues.  The transmission queue has a single clustered index:
select i.name index_name, i.type, c.name, c.column_id, t.name type_name, c.max_length, ic.key_ordinal
from 
sys.indexes i 
join sys.index_columns ic
  on ic.object_id = i.object_id
join sys.columns c
  on c.object_id = ic.object_id
 and c.column_id = ic.column_id
join sys.types t
  on t.system_type_id = c.system_type_id
  and t.user_type_id =c.user_type_id
where c.object_id =  object_id('sys.sysxmitqueue')

outpts
index_name  type name         column_id   type_name             max_length key_ordinal
----------- ---- ------------ ----------- --------------------- ---------- -----------
clst        1    dlgid        1           uniqueidentifier      16         1
clst        1    finitiator   2           bit                   1          2
clst        1    msgseqnum    8           bigint                8          3

So you won't have hot page contention on the transmission queue, and you'll have as many insertion points as you have dialog conversations (dlgid).
A normal queue has two indexes, a clustered index on
(
status,
conversation_group_id,
priority,
conversation_handle,
queuing_order
)
and a non-clustered index on
(
status,
priority,
queuing_order,
conversation_group_id,
conversation_handle,
service_id
)
which you can see with this query
select q.name queue_name, i.name index_name, i.index_id, ic.index_id, i.type, c.name column_name, c.column_id, t.name type_name, c.max_length, ic.key_ordinal
from 
SYS.SERVICE_QUEUES q
join sys.internal_tables it 
    ON it.parent_object_id = q.object_id
join sys.indexes i
    on i.object_id = it.object_id
join sys.index_columns ic
  on ic.object_id = i.object_id
  and ic.index_id = i.index_id
join sys.columns c
  on c.object_id = ic.object_id
 and c.column_id = ic.column_id
join sys.types t
  on t.system_type_id = c.system_type_id
  and t.user_type_id =c.user_type_id
order by q.object_id, i.index_id, ic.key_ordinal

So you might be better off moving the target service to a remote SQL instance.  This would offload and writing and reading of the target queues, and might have less of a bottleneck.  Your triggers would only have to put the message on the transmission queue, which is what you thought was happening in the first place.
You can watch the routing and transmission queue usage with an Extended Events session like:
CREATE EVENT SESSION [ServiceBrokerRouting] ON SERVER 
ADD EVENT sqlserver.broker_dialog_transmission_body_dequeue,
ADD EVENT sqlserver.broker_dialog_transmission_queue_enqueue,
ADD EVENT sqlserver.broker_forwarded_message_sent,
ADD EVENT sqlserver.broker_message_classify,
ADD EVENT sqlserver.broker_remote_message_acknowledgement

Also in your current design and in the remote service option, you can see from the index structures how how reusing the right number dialog conversations can optimize the solution.  Too few and you have locking and page contention issues.  Too many and you have overhead of creating and managing them, and you can't do message batching.  It looks like you've already read  Reusing Conversations, and are using a conversation-per-session pattern, which Remus recommends for this pattern.  It would be interesting to see which index the page latch contention is on, and whether it's a leaf or non-leaf page.  But in any case queue tables with concurrent SEND and RECEIVE don't usually have enough pages to spread out page latch contention.
So the design alternative is to have the triggers drop changes on N intermediate queues, and then have activation procs on those forward the messages to the single destination queue.  You may still have waits on the destination queue, but they won't be during your triggers.  Also in your intermediate-to-final queue activation procedure you can batch up sends and manage conversations and have many fewer dialog conversations (N), so the receiver can actually fetch 1000 messages per call.  A single call to RECEIVE can only fetch messages from a single conversation.  So if you have thousands of conversations interleaved, you'll always only fetch single rows.
Or simply have N destination queues and have your readers read from all of them.
There's no fundamental reason why you can't get this working, but it's not going to be simple.  The scale is immense, and Service Broker is complex.  You should also consider 3rd party solutions here.  Qlik (Attunity) has a log-based CDC solution for SQL Server that can harvest all the changes from the SQL Server transaction logs without triggers or queues.  And there are several other solutions based on SQL Server Change Data Capture.  Change Data Capture will stage all the changes for you, and you just have to consume them from your external program.  Change Tracking is the most lightweight solution, but doesn't capture the intermediate versions of rows.  So you know which rows changed and whether the change was an insert, update, or delete, but you only have the current version of the row to query.  But every one of these options is going to be expensive, tricky, and require significant testing.

Service Broker locks

One Answer

Add your own answers!

Ask a Question