JPA batch inserts with Hibernate & Spring Data

JPA batch insertsJPA batch inserts (aka bulk inserts) may seem trivial at first. However, if you’re not careful you won’t see the performance gains you expect even though your application works “just fine”. If you follow the below guidelines your JPA batch inserts should be blazingly fast, though.

JPA batch inserts with Hibernate

All the material presented in this chapter is a summary of the official Hibernate documentation on batch processing. It starts with an example how not to do JPA batch inserts:

Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for (int i=0; i<100000; i++) {
  Customer customer = new Customer(.....);
  session.save(customer);
}
tx.commit();
session.close();

This will most likely throw an OutOfMemoryException before the loop is done. That is because Hibernate caches all the newly inserted Customer instances in the session-level cache.

The first step to fix this is to enable JDBC batching with Hibernate. Set the hibernate.jdbc.batch_size property to a “sensible” value, commonly between 10 and 50.

Then you need to update your code to update and clear the session in regular intervals. ‘regular’ in this respect means at the same intervals as the underlying JDBC implementation batches your insert statements:

Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
  Customer customer = new Customer(.....);
  session.save(customer);
  if ( i % 20 == 0 ) { // 20, same as the JDBC batch size
    // flush a batch of inserts and release memory
    session.flush();
    session.clear();
  }
}
// Flush one last time to catch those beyond that last full batch.
session.flush();
session.clear();
tx.commit();
session.close();

Verify that your JPA batch inserts work fine

Hibernate may fool you if you look at the SQL statements it dumps, assuming you enabled those (nice StackOverflow answer related to this). I learned to rather trust the trace messages of org.hibernate.engine.jdbc.batch.internal.AbstractBatchImpl (or org.hibernate.jdbc.AbstractBatcher for Hibernate <v4.0) than the SQL log statements. So, make sure your logging framework is configured to log messages of that class on TRACE level.

Furthermore, if your data set is just slightly more complicated than in the above example you may not see any JDBC batching at all.

Customer customer = new Customer(.....);
X x = new X(...);
customer.setX(x)
// note we are adding an X to the customer, so that object 
// needs to be persisted as well
session.save(customer);

This results in SQL statements like

insert into Customer values (...)
insert into X values(...)

The problem is Hibernate looks at each SQL statement and checks to see if it is the same statement as the previously executed statement. If they are and if it hasn’t reached batch_size it will batch those two statements together using JDBC batch. However, if your statements look like the example above, Hibernate will see alternating insert statements and will flush an individual insert statement for each record processed.

To fix this you need to set hibernate.order_inserts=true and hibernate.order_updates=true.

JPA batch inserts with Spring Data JPA

A casual observer could be fooled into thinking that Spring Data JPA offers JPA batch inserts out of the box transparently behind the scenes. It’s true that CrudRepository does have a save(Iterable) method that calls save(Entity) in a loop. However, since it does not flush and clear the session it suffers from the problems explained above. I use the following code to work around that:

@PersistenceContext
private EntityManager entityManager;

@Value("${hibernate.jdbc.batch_size}")
// @Value("${spring.jpa.properties.hibernate.jdbc.batch_size}") for Spring Boot
private int batchSize;

public <T extends MyClass> Collection<T> bulkSave(Collection<T> entities) {
  final List<T> savedEntities = new ArrayList<T>(entities.size());
  int i = 0;
  for (T t : entities) {
    savedEntities.add(persistOrMerge(t));
    i++;
    if (i % batchSize == 0) {
      // Flush a batch of inserts and release memory.
      entityManager.flush();
      entityManager.clear();
    }
  }
  // Flush one last time to catch those beyond that last full batch.
  entityManager.flush();
  entityManager.clear();
  return savedEntities;
}

private <T extends MyClass> T persistOrMerge(T t) {
  if (t.getId() == null) {
    entityManager.persist(t);
    return t;
  } else {
    return entityManager.merge(t);
  }
}

15 thoughts on “JPA batch inserts with Hibernate & Spring Data

  1. Hi,

    I was going through this post, but unable to get anything, you just put the things/questions together, but where is the solution ?

    Please reply.

    Thanks,
    Adeel

  2. Ok, the post is calling “JPA batch inserts (using Hibernate and Spring Data JPA)”
    but i cannot see the manual how to do it using Hibernate and Spring Data;
    I only see the part that explains the hibernate configuration for it;

    Can you please give an explanation or manual how to make JPA batch inserts (using Hibernate and Spring Data JPA) ?

    Thank you.
    PS
    I agree with Adeel Ahmad and I understand what did he ask about. As far as I understand = that same question as I am.

    1. Hhhm, ok…thanks for your feedback. You need to read the articles I linked to in order to understand what problem is. Would you rather want me to repeat the solutions documented in the other articles?

  3. Hi,

    Thanks for your interesting links.

    I understand that from time to time, it’s necessary to empty the first level cache and send everything to the database:


    session.flush();
    session.clear();

    However since Spring Data JPA Repositories are used here, there is only a flush() method from JpaRepository exposed. Does the call to this method is strictly equivalent to the above two calls ?

    JpaRepository does not expose a clear() method. SimpleJpaRepository never ever call the clear() method of its internal JPA EntityManager.

    Check source code: https://github.com/spring-projects/spring-data-jpa/blob/master/src/main/java/org/springframework/data/jpa/repository/support/SimpleJpaRepository.java

    Are you aware of these implementation details ? If so, how did you do to be sure that the flush() and clear() of the underlying Hibernate session are effectivly called ?

  4. I think Crud Repository’s iterable save method does actually persist up to a batch size before flushing. You might have to set @BatchSize(size = 50) on your entity and hibernate.jdbc.batch_size but I’m noticing marked improvements on inserting 1000 records and when debugging through the code I think it’s different.

  5. I like your article because it gives the answer to my question can I insert data quickly using hibernate ? Thanks.

  6. Hi,

    Nice article. Thanks about that. Could you also suggest on exception handling on batch failure on insert/update in jpa? What will actually happen and what’s a good way to handle that?

    Thanks & Regards,
    Dharam

  7. If we mention hibernate.jdbc.batch_size=100 as a property will this not take care of doing the above logic you mentioned in the code above.

    And if we are writing the code you mentioned where you are saving a set of records and then flushing it and also clearing the session which is great, then do we still need to mention the JDBC batch side hibernate property. I think we don’t as we are flushing the session at the desired size and clearing which is like handling the entire batch mechanism by ourselves.

    Litle confused 🙁

  8. Hello Marcel, thank you for your article.

    Would this work as well if my entity A is complex and has a one-to-many relationship with another entity B ?
    (One A entity can have more than 100 B linked to it).

    Thank you very much!

Leave a Reply