Creative way of coding

Administration use case

As a Salesforce administrator, I want to identify the fields that have not been used since a long time.
My object is
I want to do this for multiple objects

Constraints

I need to

Large Date Volume orgs are difficult to query


The way you are using the API can impact performance: using http/1.1, KeepAlive, gzip compression. This is out of scope of this blog.

Reducing the scope

We will not analyze:
We must configure the Date field to be used; CreatedDate or LastModifiedDate? Impact on migration on those fields?

Solution 1 - download + process

Use any tool such as Dataloader. Parse locally the resulting CSV file.
Impacts:
+ Moderate API usage
- Long time to prepare the file on server side
- Long time to download the file
- Downloading everything even if the first records tells you all fields are used.
Waste of resource and time.

DurationAPI usageSolution 1Network usageSolution 1End ofprocessing 1Qty*Run the query*Prepare the resultingfile for download*Apicalls to identifywhen it is ready fordownloadDownloadLocalprocessing*Not efficient if all thefields have beenrecently used as westill downloadeverything

Solution 2 - API + continuous process

Use the API to run your query. Do a queryMore() until you find the last used date for all fields. You receive your records in bulk of BatchSize. The more you query fields, the smaller your BatchSize is: 250 fields can drive to a batchsize of 200 records while 3 fields can give you 2000 records in a batch
Impacts
+ Optimizing the number of records retrieved as you can stop the process as soon as you have the information for all your fields
- Lots of API consumption (small batch size, huge number of records)
- Long duration because lots of API round trips + long initial query
DurationAPI usage Solution 2Network usage 2End ofprocessing 2QtyServer timebefore firstresultN x (processing+queryMore())*Can have timeouts*Slow when all fieldshave been recentlyused*Not efficient for thelast fields to be foundas we retrieve all fields

Comparing both solutions
DurationAPI usage Solution 2Network usage 2API usageSolution 1Network usageSolution 1End ofprocessing 1End ofprocessing 2Qty

Solution 3 - refined query

Use the API to run your query. Do a queryMore() until you find the last used date for all fields, or do a new query() with less fields if you found some fields
Impacts
+ Optimizing the number of records retrieved as you can stop the process as soon as you have the information for all your fields
+ Less API calls than Solution 2
+ Quicker compared to Solution 2 as you increase the throughput
- Still long initial query, and some when refining the query
DurationAPI usage Solution 3Network usage Solution 3End ofprocessing 3Qty*Multiple queries thatare progressively faster*Accelerating throughput,reducing the time
The code for Solution 3 is available here in the Salesforce Playground
Comparing all solutions:
DurationAPI usage Solution 2Network usage Solution 2API usageSolution 1Network usageSolution 1End ofprocessing 1End ofprocessing 2QtyAPI 3Network 3End 3

Solution 4 - server processing

Same query optimization as Solution 3. Executed as ApexAnonymous on the Server
Impacts
+ Capability to process much more records in 1 roundtrip. Theorically up to 50k, but limited to CPU time -> less than 10k records
+ Immediate query because of the "limit 10k". No need to wait for the snapshot on the DB server
+ Optimized network. Only results are transmitted, not the data
+ Optimized API, because of the huge batch size

Technical tips to develop Solution 4

Use ApexAnonymous; no need to deploy/inject anything in the org
How to get the result of Apex processing on client side? The Apex Code needs to prepare a JSON dump using system.debug(). The client need to retrieve the debug logs, filter the custom debug statements to regenerate the JSON, then use it.
Apex Code is dynamically autogenerated to have the progressive query enhancement. Removing progressively from the query the fields that have been found
DurationAPI usageSolution 4Network usageSolution 4End ofprocessing 4Qty*Immediate queryperformance*No timeout*Accelerating throughput,reducing the time
The code for Solution 4 is available here in the Salesforce Playground
Synthesis of all solutions, number 4 is the best one.
DurationAPI usage Solution 2Network usage Solution 2API 1Network usageSolution 1End ofprocessing 1End ofprocessing 2QtyAPI 3Network 3End 3End 4Net4API 4