Salesforce Asked by SebTHU on October 4, 2021
I’m a bit of a beginner with Salesforce (speciality is SQL and datawarehousing). For the last few months I’ve been looking after an ETL process which pulls data out of SagePeople Cloud (which, confusingly, seems to be Salesforce inside the box).
Method is SSIS (SQL Integration Services) packages, using the Cozyroc Salesforce Source SSIS component. The loads are deltas, so I want to get "all rows updated since [a date time]". The component has an ObjectFilter property, which I set at runtime to
SystemModStamp>=[a date, e.g. 2020-06-30T23:00:00Z]
I’ve already worked out that:
The date parameter passed for each delta load execution is the date/time at which the last (successful) execution of this same load sent its SOQL to Salesforce to get the data (I save this in a variable, from SQL GETUTCDATE()) – less a safety margin, which I initially set to 5 minutes. To save confusion all these date/times are saved as datetimeoffsets at timezone Z. Logs show that this is working correctly:
Execution 1 started talking to Salesforce at 2020-07-07T06:00:00Z
Execution 2 used date parameter 2020-07-07T05:55:00Z, and started its execution at 2020-07-07T09:00:00Z
Execution 3 used date parameter 2020-07-07T08:55:00Z…. and so on.
The problem is: the delta loads are missing updates. New delta load runs 3-hourly. I’m comparing to the old delta load system, which runs daily, and takes the last 2 days of updates (actually last midnight before now, rolled back 2 days: lastmodifieddate = LAST_N_DAYS:2). That system is picking up updates – different column values – which the new system is missing: in every case, the LastModifiedDate and SystemModStamp in the pulled ETL data (old vs new) are identical, but the old system is correct (matches the current state of Salesforce).
There seem to be two types of problem cases:
The solution to (1) might be to wind back 10 or 15 minutes rather than 5. Something to do with long-running transactions inside Salesforce? How long should be allowed for these?
(2) is more mysterious. What I suspect is this:
The only clue I have is that the columns with differences are mostly what I’d call "computed" columns: things like Months_Since_[employee]Start, which might be automatically updated in bulk periodically, rather than being instantly calculated like true computed columns. But some other columns with fewer differences aren’t of this kind: e.g. Email, Gender, Postcode.
There might be something wrong with my date logic (though from the logs, I doubt it). The worrying possibility is that actually SystemModStamp is not reliable for getting deltas out of Salesforce, because some processes don’t update it. Can anyone comment on this? Thanks!
If you want true replication, use the getUpdated and getDeleted API calls. They come in both REST and SOAP flavors. They provide a latestDateCovered attribute that you can use to start the next delta check. Unlike using SOQL, you can guarantee that you'll get the correct delta every time, as long as you use the one attribute provided.
Depending on how you wrote your query, your SystemModStamp deal could miss records, including situations where you set a cap on the upper end as well as the lower end. For example, the SystemModStamp can be off by a few seconds, or may not be updated until after you do your query, etc.
In short, Replication Is Hard. The Replication APIs are specifically designed to make a best effort to ensure that no records are ever missed, no matter what crazy things might be going on in the background. If you have the ability to use them, you should do so. Details are in the relevant documentation (SOAP and REST API docs). Note that this API also gives you deletions, which a SOQL can't do reliably.
Correct answer by sfdcfox on October 4, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP