JPA batch inserts with Hibernate & Spring Data

15th June, 20129th April, 2024 Marcel Stör15 Comments

JPA batch inserts (aka bulk inserts) may seem trivial at first. However, if you’re not careful you won’t see the performance gains you expect even though your application works “just fine”. If you follow the below guidelines your JPA batch inserts should be blazingly fast, though.

JPA batch inserts with Hibernate

All the material presented in this chapter is a summary of the official Hibernate documentation on batch processing. It starts with an example how not to do JPA batch inserts:

Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for (int i=0; i<100000; i++) {
  Customer customer = new Customer(.....);
  session.save(customer);
}
tx.commit();
session.close();

This will most likely throw an OutOfMemoryException before the loop is done. That is because Hibernate caches all the newly inserted Customer instances in the session-level cache.

The first step to fix this is to enable JDBC batching with Hibernate. Set the hibernate.jdbc.batch_size property to a “sensible” value, commonly between 10 and 50.

Then you need to update your code to update and clear the session in regular intervals. ‘regular’ in this respect means at the same intervals as the underlying JDBC implementation batches your insert statements:

Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
  Customer customer = new Customer(.....);
  session.save(customer);
  if ( i % 20 == 0 ) { // 20, same as the JDBC batch size
    // flush a batch of inserts and release memory
    session.flush();
    session.clear();
  }
}
// Flush one last time to catch those beyond that last full batch.
session.flush();
session.clear();
tx.commit();
session.close();

Verify that your JPA batch inserts work fine

Hibernate may fool you if you look at the SQL statements it dumps, assuming you enabled those (nice StackOverflow answer related to this). I learned to rather trust the trace messages of org.hibernate.engine.jdbc.batch.internal.AbstractBatchImpl (or org.hibernate.jdbc.AbstractBatcher for Hibernate <v4.0) than the SQL log statements. So, make sure your logging framework is configured to log messages of that class on TRACE level.

Furthermore, if your data set is just slightly more complicated than in the above example you may not see any JDBC batching at all.

Customer customer = new Customer(.....);
X x = new X(...);
customer.setX(x)
// note we are adding an X to the customer, so that object 
// needs to be persisted as well
session.save(customer);

This results in SQL statements like

insert into Customer values (...)
insert into X values(...)

The problem is Hibernate looks at each SQL statement and checks to see if it is the same statement as the previously executed statement. If they are and if it hasn’t reached batch_size it will batch those two statements together using JDBC batch. However, if your statements look like the example above, Hibernate will see alternating insert statements and will flush an individual insert statement for each record processed.

To fix this you need to set hibernate.order_inserts=true and hibernate.order_updates=true.

JPA batch inserts with Spring Data JPA

A casual observer could be fooled into thinking that Spring Data JPA offers JPA batch inserts out of the box transparently behind the scenes. It’s true that CrudRepository does have a save(Iterable) method that calls save(Entity) in a loop. However, since it does not flush and clear the session it suffers from the problems explained above. I use the following code to work around that:

@PersistenceContext
private EntityManager entityManager;

@Value("${hibernate.jdbc.batch_size}")
// @Value("${spring.jpa.properties.hibernate.jdbc.batch_size}") for Spring Boot
private int batchSize;

public <T extends MyClass> Collection<T> bulkSave(Collection<T> entities) {
  final List<T> savedEntities = new ArrayList<T>(entities.size());
  int i = 0;
  for (T t : entities) {
    savedEntities.add(persistOrMerge(t));
    i++;
    if (i % batchSize == 0) {
      // Flush a batch of inserts and release memory.
      entityManager.flush();
      entityManager.clear();
    }
  }
  // Flush one last time to catch those beyond that last full batch.
  entityManager.flush();
  entityManager.clear();
  return savedEntities;
}

private <T extends MyClass> T persistOrMerge(T t) {
  if (t.getId() == null) {
    entityManager.persist(t);
    return t;
  } else {
    return entityManager.merge(t);
  }
}

15 thoughts on “JPA batch inserts with Hibernate & Spring Data”

Adeel Ahmad says:
29th June, 2014 at 5:28
Hi,
I was going through this post, but unable to get anything, you just put the things/questions together, but where is the solution ?
Please reply.
Thanks,
Adeel
Reply
1. Marcel Stör says:
  16th July, 2014 at 23:24
  Don’t understand what your problem is, sorry.
  Reply
Illia says:
24th September, 2014 at 17:03
Ok, the post is calling “JPA batch inserts (using Hibernate and Spring Data JPA)”
but i cannot see the manual how to do it using Hibernate and Spring Data;
I only see the part that explains the hibernate configuration for it;
Can you please give an explanation or manual how to make JPA batch inserts (using Hibernate and Spring Data JPA) ?
Thank you.
PS
I agree with Adeel Ahmad and I understand what did he ask about. As far as I understand = that same question as I am.
Reply
1. Marcel Stör says:
  27th September, 2014 at 22:26
  Hhhm, ok…thanks for your feedback. You need to read the articles I linked to in order to understand what problem is. Would you rather want me to repeat the solutions documented in the other articles?
  Reply
Stephan says:
5th October, 2014 at 20:58
Hi,
Thanks for your interesting links.
I understand that from time to time, it’s necessary to empty the first level cache and send everything to the database:
…
session.flush();
session.clear();
However since Spring Data JPA Repositories are used here, there is only a flush() method from JpaRepository exposed. Does the call to this method is strictly equivalent to the above two calls ?
JpaRepository does not expose a clear() method. SimpleJpaRepository never ever call the clear() method of its internal JPA EntityManager.
Check source code: https://github.com/spring-projects/spring-data-jpa/blob/master/src/main/java/org/springframework/data/jpa/repository/support/SimpleJpaRepository.java
Are you aware of these implementation details ? If so, how did you do to be sure that the flush() and clear() of the underlying Hibernate session are effectivly called ?
Reply
1. Marcel Stör says:
  9th October, 2014 at 22:38
  I understand I need to write a full-fledged post about this issue sooner rather than later. I’ll try to incorporate your questions. Stay tuned…
  Reply
Stephan says:
5th October, 2014 at 21:06
In the end, I think Spring Data has not been made to do bulk inserts. Spring Batch (http://spring.io/guides/gs/batch-processing/) should be used instead…
Reply
Marcel Stör says:
23rd June, 2015 at 23:44
So, guys…I finally re-wrote this article from scratch. I hope it better suits your needs. Let me know if there’s anything you still miss.
Reply
Booger Francais says:
24th November, 2015 at 1:29
I think Crud Repository’s iterable save method does actually persist up to a batch size before flushing. You might have to set @BatchSize(size = 50) on your entity and hibernate.jdbc.batch_size but I’m noticing marked improvements on inserting 1000 records and when debugging through the code I think it’s different.
Reply
Mani anis says:
31st December, 2015 at 8:13
I like your article because it gives the answer to my question can I insert data quickly using hibernate ? Thanks.
Reply
Dharam says:
18th September, 2016 at 8:01
Hi,
Nice article. Thanks about that. Could you also suggest on exception handling on batch failure on insert/update in jpa? What will actually happen and what’s a good way to handle that?
Thanks & Regards,
Dharam
Reply
Theodor Peterson says:
21st May, 2017 at 19:34
Thanks so much for this, the OutOfMemoryException was driving me insane!
Reply
shahbaz says:
19th September, 2017 at 18:19
If we mention hibernate.jdbc.batch_size=100 as a property will this not take care of doing the above logic you mentioned in the code above.
And if we are writing the code you mentioned where you are saving a set of records and then flushing it and also clearing the session which is great, then do we still need to mention the JDBC batch side hibernate property. I think we don’t as we are flushing the session at the desired size and clearing which is like handling the entire batch mechanism by ourselves.
Litle confused 🙁
Reply
1. Marcel Stör says:
  27th September, 2017 at 20:32
  No, the two things are complimentary. The JDBC batch size config is the low level measure.
  Reply
Maxime says:
3rd October, 2017 at 18:50
Hello Marcel, thank you for your article.
Would this work as well if my entity A is complex and has a one-to-many relationship with another entity B ?
(One A entity can have more than 100 B linked to it).
Thank you very much!
Reply

my2cents

"The Earth was made round so that we would not see too far down the road" — Karen Blixen

JPA batch inserts with Hibernate & Spring Data

JPA batch inserts with Hibernate

Verify that your JPA batch inserts work fine

JPA batch inserts with Spring Data JPA

Related content

15 thoughts on “JPA batch inserts with Hibernate & Spring Data”

Leave a ReplyCancel reply