In this article I want to explain an issue concerning the JPA OneToMany relationship which I was faced with in one of my own projects. Working with object -orientated languages like java we often model relationships between object in various cases. One of such relationships is the OnToMany relationship. For example an object ‘Server’ may have a relationship to an object ‘Configuration’. To make the ‘Configuration’ object generic we can model the JPA object in Java like this:
@javax.persistence.Entity public class Configuration implements java.io.Serializable { private static final long serialVersionUID = 1L; @Id @GeneratedValue private BigInteger id; public String itemValue; public String itemName; @SuppressWarnings("unused") private Configuration() { } public Configuration(String name, String value) { itemValue = value; itemName = name; } }
The generic Configuration object is quite simple an consists of a key-value pair.
Now lets assume that the Server object contains a List of configuration objects. In Java this JPA object would look like this:
@javax.persistence.Entity public class Server implements java.io.Serializable { private static final long serialVersionUID = 1L; private String id; private String type; private List<Configuration> configurations; public Server() { } @Id public String getId() { return id; } protected void setId(String aID) { id = aID; } public String getType() { return type; } public void setType(String type) { this.type = type; } @OneToMany public List<Configuration> getConfigurations() { if (configurations == null) configurations = new Vector<Configuration>(); return configurations; } public void setConfigurations(List<Configuration> configurations) { this.configurations = configurations; } }
We use only the default behavior of JPA. The Server Object we created now is something typical what we have often in object orientated languages. Its the concept of a Map interface. A Server object can have one or many Configuration objects.
The Database Schema
Now lets take a look on the database schema which is generated by the JPA OR-Mapper automatically:
Table »public.configuration« Column | Type | Attribute -----------+------------------------+----------- id | bigint | not null itemname | character varying(255) | itemvalue | character varying(255) | Indexes: "textitem_pkey" PRIMARY KEY, btree (id) Foreign key references : TABLE "server_configuration" CONSTRAINT "fk_server_configuration_configurations_id" FOREIGN KEY (configurations_id) REFERENCES configuration(id) Table »public.server« Column | Type | Attribute ----------+------------------------+----------- id | character varying(255) | not null type | character varying(255) | Indexes: "entity_pkey" PRIMARY KEY, btree (id) Foreign key references: TABLE "entity_textitem" CONSTRAINT "fk_entity_textitem_entity_id" FOREIGN KEY (entity_id) REFERENCES entity(id) Table »public.server_configuration« Column | Type | Attribute --------------+------------------------+----------- entity_id | character varying(255) | not null textitems_id | bigint | not null Indexes: "server_configuration_pkey" PRIMARY KEY, btree (server_id, configuration_id) Foreignkey-Constraints: "fk_server_configuration_server_id" FOREIGN KEY (server_id) REFERENCES server(id) "fk_server_configuration_configurations_id" FOREIGN KEY (configurations_id) REFERENCES confiuration(id)
What’s happen here is, that JPA constructs a so called Join-Table to map the relationship between the server and the configuration. A Join-Table is no bad idea, but selects can become complicated. Assume we have the following data:
Table »public.configuration« id | itemname | itemvalue ----+------------------+--------------------------- 1 | os | Linux 2 | harddisk | SSD 3 | os | Windows 4 | harddisk | SSD
Table »public.server« id | type ----+---------- 14cd0def79b-c3826b4 | virtual 14cd1002f8e-268132c | hardware
Table »public.server_configuration« server_id | configuration_id ----------------------+-------------- 14cd0def79b-c3826b4 | 1 14cd0def79b-c3826b4 | 2 14cd1002f8e-268132c | 3 14cd1002f8e-268132c | 4
So far everything looks fine. We have two server objects – one virtual and one hardware box. And each server has two configuration entries – the OS and the harddisk type.
The Problem
Now lets see what can happen with our JPA implementation on the Database side when we try to select data.
We want to select all servers which are running on linux and use a SSD harddisk. In JPQL (the query language for JPA) this looks like:
SELECT server FROM Server AS server JOIN server.configurations AS t1 JOIN server.configurations AS t2 WHERE server.type='virtual' AND t1.itemName = 'os' AND t1.itemValue = 'Linux' AND t2.itemName = 'harddisk' AND t2.itemValue = 'SSD'
What we need here is a JOIN to select the different key-value pairs. A sub-select is not possible, because our Configuration object has no reference to the Server Objects (per default we use a unidirectional relationship).
The SQL Statement for this JPQL we look like this:
SELECT DISTINCT t1.ID AS a1, t1.TYPE AS a4 FROM SERVER_CONFIGURATION t4, SERVER_CONFIGURATION t3, CONFIGURATION t2, CONFIGURATION t0 SERVER t1, WHERE ((((( (t1.TYPE = $1) AND (t0.ITEMNAME = $2)) AND (t0.ITEMVALUE = $3)) AND (t2.ITEMNAME = $4)) AND (t2.ITEMVALUE = $5)) AND (((t3.SERVER_ID = t1.ID) AND (t0.ID = t3.CONFIGURATIONs_ID)) AND ((t4.SERVER_ID = t1.ID) AND (t2.ID = t4.CONFIGURATIONs_ID)) ))
It looks very strange because JPA has to use the Join-Table to join the Server with the Configuration. And the statement contains two self-join clauses because we are interested in two key-value pairs. Even if the statement looks weird it doesn’t need to be a big problem. A performance issue only occurs if we have a lot for objects in our database. For example if we have more than 7000 configuration objects, this statement can aggregate internal a resultset with 49.000.000 records.
The Solution
To reduce the complexity of the select statements we can avoid the generation of the join table by defining a JoinColumn:
@javax.persistence.Entity public class Configuration implements java.io.Serializable { private static final long serialVersionUID = 1L; @Id @GeneratedValue private BigInteger id; public String itemValue; public String itemName; @Column(name = "ENTITY_ID") private String entityId; @SuppressWarnings("unused") private Configuration() { } public Configuration(String name, String value) { itemValue = value; itemName = name; } }
@javax.persistence.Entity public class Server implements java.io.Serializable { private static final long serialVersionUID = 1L; private String id; private String type; private List<Configuration> configurations; public Server() { } @Id @Column(name = "ID") public String getId() { return id; } protected void setId(String aID) { id = aID; } public String getType() { return type; } public void setType(String type) { this.type = type; } @OneToMany(fetch = FetchType.LAZY) @JoinColumn(name = "ENTITY_ID", referencedColumnName = "ID") public List<Configuration> getConfigurations() { if (configurations == null) configurations = new Vector<Configuration>(); return configurations; } public void setConfigurations(List<Configuration> configurations) { this.configurations = configurations; } }
With the @JoinColumn annotation the database schema will now look like this:
Table »public.configuration« Column | Type | Attribute -----------+------------------------+----------- id | bigint | not null server_id | character varying(255) | itemname | character varying(255) | itemvalue | character varying(255) | Indexes: "textitem_pkey" PRIMARY KEY, btree (id) Fremdschlüssel-Constraints: "fk_textitem_entity_id" FOREIGN KEY (entity_id) REFERENCES entity(id) Table »public.server« Column | Type | Attribute ----------+-----------------------------+----------- id | character varying(255) | not null type | character varying(255) | Indexes: "entity_pkey" PRIMARY KEY, btree (id) foreign keys from: TABLE "textitem" CONSTRAINT "fk_textitem_entity_id" FOREIGN KEY (entity_id) REFERENCES entity(id)
The Join table is no longer needed, because the reference is part of the detail table (configuration).
The table entries of the configuration table will look like in the following example:
id | server_id | itemname | itemvalue ----+----------------------+----------+---------------------- 1 | 14cd0def79b-c3826b4 | os | Linux 2 | 14cd0def79b-c3826b4 | harddisk | SSD 3 | 14cd1002f8e-268132c | os | Windows 4 | 14cd1002f8e-268132c | harddisk | SSD
If we now use the same JPQL statement as before the SQL statement translation looks like this:
SELECT DISTINCT t1.ID AS a1, t1.TYPE AS a4 FROM CONFIGURATION t0, CONFIGURATION t2, SERVER t1 WHERE ((((((t1.TYPE = $1) AND (t0.ITEMNAME = $2)) AND (t0.ITEMVALUE = $3)) AND (t2.ITEMNAME = $4)) AND (t2.ITEMVALUE = $5)) AND ((t0.SERVER_ID = t1.ID) AND (t2.SERVER_ID = t1.ID)))
This will reduce the complexity and the response time from the database.
But one important benefit of the new JoinColumn is that the @OneToMany relationship is now bidirectional. It is possible to navigate from the configuration table back to the server table. For this reason it is now possible to avoid the inner joins and use sub-selects instead. See the following JPQL example:
SELECT server FROM Server server WHERE server.type='virtual' AND server.id IN ( SELECT t.serverId FROM Configuration AS t WHERE t.itemName="os" AND t.itemValue="Linux" ) AND server.id IN ( SELECT t.serverId FROM Configuration AS t WHERE t.itemName="harddisk" AND t.itemValue="SSD" )
And this will result in the following SQL statement:
SELECT DISTINCT t0.ID AS a1, t0.TYPE AS a4 FROM SERVER t0 WHERE (((t0.TYPE = $1) AND t0.ID IN ( SELECT t1.ENTITY_ID FROM TEXTITEM t1 WHERE ((t1.ITEMNAME = $2) AND (t1.ITEMVALUE = $3))) ) AND t0.ID IN ( SELECT t2.ENTITY_ID FROM TEXTITEM t2 WHERE ((t2.ITEMNAME = $4) AND (t2.ITEMVALUE = $5))) )
As a result this will increase the performance dramatically.