JPA and @OneToMany – How to Optimize?

In this article I want to explain an issue concerning the JPA OneToMany relationship which I was faced with in one of my own projects. Working with object -orientated languages like java we often model relationships between object in various cases. One of such relationships is the OnToMany relationship. For example an object ‘Server’ may have a relationship to an object ‘Configuration’. To make the ‘Configuration’ object generic we can model the JPA object in Java like this:

@javax.persistence.Entity
public class Configuration implements java.io.Serializable {
 private static final long serialVersionUID = 1L;

 @Id
 @GeneratedValue
 private BigInteger id;

 public String itemValue;
 public String itemName;

 @SuppressWarnings("unused")
 private Configuration() {
 }

 public Configuration(String name, String value) {
 itemValue = value;
 itemName = name;
 }
}

The generic Configuration object is quite simple an consists of a key-value pair.

Now lets assume that the Server object contains a List of configuration objects. In Java this JPA object would look like this:

@javax.persistence.Entity
public class Server implements java.io.Serializable {
 private static final long serialVersionUID = 1L;
 private String id;
 private String type;
 private List<Configuration> configurations;

 public Server() {
 }

 @Id
 public String getId() {
 return id;
 }

 protected void setId(String aID) {
 id = aID;
 }

 public String getType() {
 return type;
 }

 public void setType(String type) {
 this.type = type;
 }

 @OneToMany
 public List<Configuration> getConfigurations() {
 if (configurations == null)
 configurations = new Vector<Configuration>();
 return configurations;
 }

 public void setConfigurations(List<Configuration> configurations) {
 this.configurations = configurations;
 }
}

We use only the default behavior of JPA. The Server Object we created now is something typical what we have often in object orientated languages. Its the concept of a Map interface. A Server object can have one or many Configuration objects.

The Database Schema

Now lets take a look on the database schema which is generated by the JPA OR-Mapper automatically:

Table »public.configuration«
 Column    | Type | Attribute 
-----------+------------------------+-----------
 id        | bigint                 | not null
 itemname  | character varying(255) | 
 itemvalue | character varying(255) | 
Indexes:
 "textitem_pkey" PRIMARY KEY, btree (id)
Foreign key references :
 TABLE "server_configuration" CONSTRAINT "fk_server_configuration_configurations_id" FOREIGN KEY (configurations_id) REFERENCES configuration(id)


 Table »public.server«
 Column   | Type                   | Attribute 
----------+------------------------+-----------
 id       | character varying(255) | not null
 type     | character varying(255) | 
Indexes:
 "entity_pkey" PRIMARY KEY, btree (id)
Foreign key references:
 TABLE "entity_textitem" CONSTRAINT "fk_entity_textitem_entity_id" FOREIGN KEY (entity_id) REFERENCES entity(id)


 Table »public.server_configuration«
 Column       | Type                    | Attribute 
--------------+------------------------+-----------
 entity_id    | character varying(255) | not null
 textitems_id | bigint                 | not null
Indexes:
 "server_configuration_pkey" PRIMARY KEY, btree (server_id, configuration_id)
Foreignkey-Constraints:
 "fk_server_configuration_server_id" FOREIGN KEY (server_id) REFERENCES server(id)
 "fk_server_configuration_configurations_id" FOREIGN KEY (configurations_id) REFERENCES confiuration(id)

What’s happen here is, that JPA constructs a so called Join-Table to map the relationship between the server and the configuration. A Join-Table is no bad idea, but selects can become complicated. Assume we have the following data:

Table »public.configuration«
 id | itemname         | itemvalue 
----+------------------+---------------------------
 1  | os               | Linux
 2  | harddisk         | SSD
 3  | os               | Windows
 4  | harddisk         | SSD

Table »public.server«
 id | type            
----+----------
14cd0def79b-c3826b4 | virtual 
14cd1002f8e-268132c | hardware
Table »public.server_configuration«
 server_id            | configuration_id 
----------------------+--------------
 14cd0def79b-c3826b4  | 1
 14cd0def79b-c3826b4  | 2
 14cd1002f8e-268132c  | 3
 14cd1002f8e-268132c  | 4

So far everything looks fine. We have two server objects – one virtual and one hardware box. And each server has two  configuration entries – the OS and the harddisk type.

The Problem

Now lets see what can happen with our JPA implementation on the Database side when we try to select data.

We want to select all servers which are running on linux and use a SSD harddisk. In JPQL (the query language for JPA) this looks like:

 SELECT server FROM Server AS server
 JOIN server.configurations AS t1
 JOIN server.configurations AS t2
 WHERE server.type='virtual'
  AND t1.itemName = 'os'
  AND t1.itemValue = 'Linux'
  AND t2.itemName = 'harddisk'
  AND t2.itemValue = 'SSD'

What we need here is a JOIN to select the different key-value pairs. A sub-select is not possible, because our Configuration object has no reference to the Server Objects (per default we use a unidirectional relationship).

The SQL Statement for this JPQL we look like this:

 SELECT DISTINCT t1.ID AS a1, t1.TYPE AS a4
 FROM SERVER_CONFIGURATION t4, 
      SERVER_CONFIGURATION t3, 
      CONFIGURATION t2, 
      CONFIGURATION t0
      SERVER t1,       
 WHERE ((((( (t1.TYPE = $1)
         AND (t0.ITEMNAME = $2)) AND (t0.ITEMVALUE = $3)) 
         AND (t2.ITEMNAME = $4)) AND (t2.ITEMVALUE = $5)) 
         AND (((t3.SERVER_ID = t1.ID)
         AND (t0.ID = t3.CONFIGURATIONs_ID))
         AND ((t4.SERVER_ID = t1.ID) 
         AND (t2.ID = t4.CONFIGURATIONs_ID))
       ))

It looks very strange because JPA has to use the Join-Table to join the Server with the Configuration. And the statement contains two self-join clauses because we are interested in two key-value pairs. Even if the statement looks weird it doesn’t need to be a big problem. A performance issue only occurs if we have a lot for objects in our database. For example if we have more than 7000 configuration objects, this statement can aggregate internal a resultset with 49.000.000  records.

The Solution

To reduce the complexity of the select statements we can avoid the generation of the join table by defining a JoinColumn:

@javax.persistence.Entity
public class Configuration implements java.io.Serializable {

 private static final long serialVersionUID = 1L;

 @Id
 @GeneratedValue
 private BigInteger id;

 public String itemValue;

 public String itemName;

 @Column(name = "ENTITY_ID")
 private String entityId;

 @SuppressWarnings("unused")
 private Configuration() {
 }

 public Configuration(String name, String value) {
 itemValue = value;
 itemName = name;
 }

}
@javax.persistence.Entity
public class Server implements java.io.Serializable {

 private static final long serialVersionUID = 1L;
 private String id;
 private String type;

 private List<Configuration> configurations;

 public Server() {
 }

 @Id
 @Column(name = "ID")
 public String getId() {
 return id;
 }

 protected void setId(String aID) {
 id = aID;
 }

 public String getType() {
 return type;
 }

 public void setType(String type) {
 this.type = type;
 }

 @OneToMany(fetch = FetchType.LAZY)
 @JoinColumn(name = "ENTITY_ID", referencedColumnName = "ID")
 public List<Configuration> getConfigurations() {
 if (configurations == null)
 configurations = new Vector<Configuration>();
 return configurations;
 }

 public void setConfigurations(List<Configuration> configurations) {
 this.configurations = configurations;
 }
}

With the @JoinColumn annotation the database schema will now look like this:

 Table »public.configuration«
 Column    | Type                   | Attribute 
-----------+------------------------+-----------
 id        | bigint                 | not null
 server_id | character varying(255) | 
 itemname  | character varying(255) | 
 itemvalue | character varying(255) | 
Indexes:
 "textitem_pkey" PRIMARY KEY, btree (id)
Fremdschlüssel-Constraints:
 "fk_textitem_entity_id" FOREIGN KEY (entity_id) REFERENCES entity(id)


 Table »public.server«
 Column   | Type                        | Attribute 
----------+-----------------------------+-----------
 id       | character varying(255)      | not null
 type     | character varying(255)      | 
Indexes:
 "entity_pkey" PRIMARY KEY, btree (id)
foreign keys from:
 TABLE "textitem" CONSTRAINT "fk_textitem_entity_id" FOREIGN KEY (entity_id) REFERENCES entity(id)

The Join table is no longer needed, because the reference is part of the detail table (configuration).

The table entries of the configuration table will look like in the following example:

 id | server_id            | itemname | itemvalue 
----+----------------------+----------+----------------------
 1  | 14cd0def79b-c3826b4  | os       | Linux
 2  | 14cd0def79b-c3826b4  | harddisk | SSD
 3  | 14cd1002f8e-268132c  | os       | Windows
 4  | 14cd1002f8e-268132c  | harddisk | SSD

If we now use the same JPQL statement as before the SQL statement translation looks like this:

SELECT DISTINCT t1.ID AS a1, t1.TYPE AS a4
 FROM CONFIGURATION t0, 
      CONFIGURATION t2, 
      SERVER t1
 WHERE ((((((t1.TYPE = $1) AND (t0.ITEMNAME = $2))
 AND (t0.ITEMVALUE = $3)) AND (t2.ITEMNAME = $4))
 AND (t2.ITEMVALUE = $5)) AND ((t0.SERVER_ID = t1.ID)
 AND (t2.SERVER_ID = t1.ID)))

This will reduce the complexity and the response time from the database.

But one important benefit of the new JoinColumn is that the @OneToMany relationship is now bidirectional. It is possible to navigate from the configuration table back to the server table. For this reason it is now possible to avoid the inner joins and use sub-selects instead. See the following JPQL example:

SELECT server FROM Server server 
WHERE server.type='virtual' 
 AND server.id IN (
 SELECT t.serverId FROM Configuration AS t
 WHERE t.itemName="os" AND t.itemValue="Linux"
 )
 AND server.id IN (
 SELECT t.serverId FROM Configuration AS t
 WHERE t.itemName="harddisk" AND t.itemValue="SSD"
 )

And this will result in the following SQL statement:

 SELECT DISTINCT t0.ID AS a1, t0.TYPE AS a4
 FROM SERVER t0
 WHERE (((t0.TYPE = $1)
  AND t0.ID IN (
      SELECT t1.ENTITY_ID FROM TEXTITEM t1 
      WHERE ((t1.ITEMNAME = $2) AND (t1.ITEMVALUE = $3)))
  ) 
 AND t0.ID IN (
      SELECT t2.ENTITY_ID FROM TEXTITEM t2
      WHERE ((t2.ITEMNAME = $4) AND (t2.ITEMVALUE = $5)))
 )

As a result this will increase the performance dramatically.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.