Sunday, 6 February 2022

Salesforce Bulk API Using PK Chunking

 Bulk API provides a programmatic option to quickly retrieve and load data from and to Salesforce. 

It is based on the REST principle and is optimized for managing large data volume. 

But if the table has more than 10 million records, the process will time out or hit errors. 

Here is where you need PK chunking. PK Chunking splits the query into smaller chunks automatically, thereby making the process easier and faster.


Salesforce support custom index own custom fields that will help us easily locate the rows without scanning

every row in the database.

Index points to the row of data.

They use the clolumns to identify the data row without scanning the full table.


Salesforce IDs:

This is the fastest way to find a record in the database by using the ID in the specific query.


Bulk API provides a programatic option to load or retrieve your org's data to and from salesforce.

Bulk API is optimised for retrieving a large set of data.

We can use it to query,queryall,insert ,update or upsert records.


If your table is massive then the bulk queries usually times out or gives errors as it finds it hard to complete the process.


What is PK Chunking?

->PK stands for Primary Key.

-> Feature enabled in spring '15.

-> Automatically split the query based on Primary Key.

-> Execute a query for each chunk and return the data.

-> Makes large queries manageable in Bulk API.


The query is divided into smaller queries,and each queries will retrieve a smaller portion of data

in parallel thereby making the process easy and faster.


Extract queries are run with successive boundaries,and the number of records to be retrieved by each query is called

the chunk size.


Each query retrieve the maximum number of records as the chunk size.


ex:

The first query retrieves is for the records to do a specified starting ID and the

starting ID plus the chunk size,and the next part is the next chunk of record

and the process will continue until all the data is retrieved.


When to use PK Chunking?

->Objects more than 10 million records to improve performance.

->When a bulk query consistently times out.


Supported Objects 


-> Standard (Not all Objects)

-> Custom

-> Sharing tables(If Parent is supported)

-> History tables(If Parent is supported)


Common Errors during Data Management


-> Query not 'selective' enough

Non-selective query against large object type(more than 100000 rows).

->Query takes too long 

No response from the server.

->Time limit exceeded

Your request exceeded the time limit for processing.

->Too much data returned in query

Too many query rows : 50001

Remoting response size exceeded maximum of 15 MB.


How to enable PK Chunking?


we need to add certain parameters to the Bulk API request headers to enable PK chunking.


Parameter 


1.Field name  : Sforce-Enable-PKChunking

2.Field values : TRUE -Enable PKChunking,FALSE-Disable PKChunking

3.chunksize     : Number of records in each chunk.Default-2,000,Maximum size-2,50,000

4.Parent       : Parent object when PK chunking for queries on sharing objects.

5.startRow    : 15/18-character Record ID.Lower boundary for the first chunk.


ex : Sforce-Enable-PKChunking: chunkSize=50000; startRow=00130000000xEftMGH


Limitations :


-> PK chunking cannot be enabled for queries with 

Order By

Filtering on any Id fields

Limit Clause

-> Enabling PK chunking in Dataloader is still an idea.

-> Each chunk is processed as a separate batch that counts towards your daily batch limit.

No comments:

Post a Comment