Creative way of coding

Administration use case

As a Salesforce administrator, I want to identify the fields that have not been used since a long time.
My object is
  • hosting millions of records
  • heavily customized

I want to do this for multiple objects

Constraints

I need to
  • Optimize the network
    I don't want to download gigabytes of data over the network
  • Optimize the API
    I have limited calls per 24h
  • Automate the process
    I will not do this manually, it has to be automated
  • Optimize duration
    I should not wait too much to get the information

Large Date Volume orgs are difficult to query

  • Timeouts can occur frequently
  • Can hit the limit of long running queries

The way you are using the API can impact performance: using http/1.1, KeepAlive, gzip compression. This is out of scope of this blog.

Reducing the scope

We will not analyze:
  • Formula fields
  • Non nilable fields (checked as required)
  • Checkboxes (true or false, no capability to identify if it has been set)
  • System fields

We must configure the Date field to be used; CreatedDate or LastModifiedDate? Impact on migration on those fields?

Solution 1 - download + process

Use any tool such as Dataloader. Parse locally the resulting CSV file.
Impacts:
+ Moderate API usage
- Long time to prepare the file on server side
- Long time to download the file
- Downloading everything even if the first records tells you all fields are used.
Waste of resource and time.

DurationAPI usageSolution 1Network usageSolution 1End ofprocessing 1Qty*Run the query*Prepare the resultingfile for download*Apicalls to identifywhen it is ready fordownloadDownloadLocalprocessing*Not efficient if all thefields have beenrecently used as westill downloadeverything

Solution 2 - API + continuous process

Use the API to run your query. Do a queryMore() until you find the last used date for all fields. You receive your records in bulk of BatchSize. The more you query fields, the smaller your BatchSize is: 250 fields can drive to a batchsize of 200 records while 3 fields can give you 2000 records in a batch
Impacts
+ Optimizing the number of records retrieved as you can stop the process as soon as you have the information for all your fields
- Lots of API consumption (small batch size, huge number of records)
- Long duration because lots of API round trips + long initial query
DurationAPI usage Solution 2Network usage 2End ofprocessing 2QtyServer timebefore firstresultN x (processing+queryMore())*Can have timeouts*Slow when all fieldshave been recentlyused*Not efficient for thelast fields to be foundas we retrieve all fields

Comparing both solutions
DurationAPI usage Solution 2Network usage 2API usageSolution 1Network usageSolution 1End ofprocessing 1End ofprocessing 2Qty

Solution 3 - refined query

Use the API to run your query. Do a queryMore() until you find the last used date for all fields, or do a new query() with less fields if you found some fields
  • The BatchSize will increase progressively
  • The quantity of records per roundtrip will increase.
  • The total duration will decrease compared to Solution 2

Impacts
+ Optimizing the number of records retrieved as you can stop the process as soon as you have the information for all your fields
+ Less API calls than Solution 2
+ Quicker compared to Solution 2 as you increase the throughput
- Still long initial query, and some when refining the query
DurationAPI usage Solution 3Network usage Solution 3End ofprocessing 3Qty*Multiple queries thatare progressively faster*Accelerating throughput,reducing the time
The code for Solution 3 is available here in the Salesforce Playground
Comparing all solutions:
DurationAPI usage Solution 2Network usage Solution 2API usageSolution 1Network usageSolution 1End ofprocessing 1End ofprocessing 2QtyAPI 3Network 3End 3

Solution 4 - server processing

Same query optimization as Solution 3. Executed as ApexAnonymous on the Server
Impacts
+ Capability to process much more records in 1 roundtrip. Theorically up to 50k, but limited to CPU time -> less than 10k records
+ Immediate query because of the "limit 10k". No need to wait for the snapshot on the DB server
+ Optimized network. Only results are transmitted, not the data
+ Optimized API, because of the huge batch size

Technical tips to develop Solution 4

Use ApexAnonymous; no need to deploy/inject anything in the org
  • executeanonymous() from SOAP API
    Very efficient, retrieve Apex debug logs as the result
  • executeAnonymous using the tooling API. Requires lots of API calls
    1. POST to /tooling/sobjects/traceFlag to define the debug log level
    2. Execute the anonymous Apex code
    3. Get the Id of the last log from the ApexLog tooling object
    4. retrieve the debug log content by querying the ApexLog object Body

How to get the result of Apex processing on client side? The Apex Code needs to prepare a JSON dump using system.debug(). The client need to retrieve the debug logs, filter the custom debug statements to regenerate the JSON, then use it.
Apex Code is dynamically autogenerated to have the progressive query enhancement. Removing progressively from the query the fields that have been found
DurationAPI usageSolution 4Network usageSolution 4End ofprocessing 4Qty*Immediate queryperformance*No timeout*Accelerating throughput,reducing the time
The code for Solution 4 is available here in the Salesforce Playground
Synthesis of all solutions, number 4 is the best one.
DurationAPI usage Solution 2Network usage Solution 2API 1Network usageSolution 1End ofprocessing 1End ofprocessing 2QtyAPI 3Network 3End 3End 4Net4API 4

 
© 2018 Jean-Luc Antoine, All Rights Reserved