Bulk API provides a programmatic option to quickly retrieve and load data from and to Salesforce.
It is based on the REST principle and is optimized for managing large data volume.
But if the table has more than 10 million records, the process will time out or hit errors.
Here is where you need PK chunking. PK Chunking splits the query into smaller chunks automatically, thereby making the process easier and faster.
Salesforce support custom index own custom fields that will help us easily locate the rows without scanning
every row in the database.
Index points to the row of data.
They use the clolumns to identify the data row without scanning the full table.
Salesforce IDs:
This is the fastest way to find a record in the database by using the ID in the specific query.
Bulk API provides a programatic option to load or retrieve your org's data to and from salesforce.
Bulk API is optimised for retrieving a large set of data.
We can use it to query,queryall,insert ,update or upsert records.
If your table is massive then the bulk queries usually times out or gives errors as it finds it hard to complete the process.
What is PK Chunking?
->PK stands for Primary Key.
-> Feature enabled in spring '15.
-> Automatically split the query based on Primary Key.
-> Execute a query for each chunk and return the data.
-> Makes large queries manageable in Bulk API.
The query is divided into smaller queries,and each queries will retrieve a smaller portion of data
in parallel thereby making the process easy and faster.
Extract queries are run with successive boundaries,and the number of records to be retrieved by each query is called
the chunk size.
Each query retrieve the maximum number of records as the chunk size.
ex:
The first query retrieves is for the records to do a specified starting ID and the
starting ID plus the chunk size,and the next part is the next chunk of record
and the process will continue until all the data is retrieved.
When to use PK Chunking?
->Objects more than 10 million records to improve performance.
->When a bulk query consistently times out.
Supported Objects
-> Standard (Not all Objects)
-> Custom
-> Sharing tables(If Parent is supported)
-> History tables(If Parent is supported)
Common Errors during Data Management
-> Query not 'selective' enough
Non-selective query against large object type(more than 100000 rows).
->Query takes too long
No response from the server.
->Time limit exceeded
Your request exceeded the time limit for processing.
->Too much data returned in query
Too many query rows : 50001
Remoting response size exceeded maximum of 15 MB.
How to enable PK Chunking?
we need to add certain parameters to the Bulk API request headers to enable PK chunking.
Parameter
1.Field name : Sforce-Enable-PKChunking
2.Field values : TRUE -Enable PKChunking,FALSE-Disable PKChunking
3.chunksize : Number of records in each chunk.Default-2,000,Maximum size-2,50,000
4.Parent : Parent object when PK chunking for queries on sharing objects.
5.startRow : 15/18-character Record ID.Lower boundary for the first chunk.
ex : Sforce-Enable-PKChunking: chunkSize=50000; startRow=00130000000xEftMGH
Limitations :
-> PK chunking cannot be enabled for queries with
Order By
Filtering on any Id fields
Limit Clause
-> Enabling PK chunking in Dataloader is still an idea.
-> Each chunk is processed as a separate batch that counts towards your daily batch limit.
No comments:
Post a Comment