Help

One of the reasons we use relational database technology is that existing RDBMS implementations provide extremely mature, scalable and robust concurrency control. This means much more than simple read/write locks. For example, databases that use locking are built to scale efficiently when a particular transaction obtains /many/ locks - this is called /lock escalation/. On the other hand, some databases (for example, Oracle and PostgreSQL) don't use locks at all - instead, they use the multiversion concurrency model. This sophisticated approach to concurrency is designed to achieve higher scalability than is possible using traditional locking models. Databases even let you specify the required level of transaction isolation, allowing you to trade isolation for scalability.

Unfortunately, some Java persistence frameworks (especially CMP engines) assume that they can improve upon the many years of research and development that has gone into these relational systems by implementing their own concurrency control in the Java application. Usually, this takes the form of a comparatively crude locking model, with the locks held in the Java middle tier. There are three main problems with this approach. First, it subverts the concurrency model of the underlying database. If you have spent a lot of money on your Oracle installation, it seems insane to throw away Oracle's sophisticated multiversion concurrency model and replace it with a (less-scalable) locking model. Second, other (non-Java?) applications that share the same database are not aware of the locks. Finally, locks held in the middle tier do not naturally scale to a clustered environment. Some kind of distributed lock will be needed. At best, distributed locking will be implemented using some efficient group communication library like JGroups. At worst (for example, in OJB), the persistence framework will persist the locks to a special database table. Clearly, both of these solutions carry a heavy performance cost. Accordingly, Hibernate was designed to /not require/ any middle-tier locks - even thread synchronization is avoided. This is perhaps the best and least-understood feature of Hibernate and is the key to why Hibernate scales well. So why do other frameworks not just let the database handle concurrency?

Well, the only good justification for holding locks in the middle tier is that we might be using a middle-tier cache. It turns out that the problem of ensuring consistency between the database and the cache is an extremely difficult one and solutions usually do involve some use of middle-tier locking. (Incidently, most applications which use a cache do not solve this problem correctly, even in a non-clustered environment.)

So, for example, when Hibernate integrates with JBoss Cache, the cache implementation must obtain clustered locks internally (again, using JGroups). In Hibernate, we consider it a quality-of-service concern of the cache implementation to provide this kind of functionality. We can do this because Hibernate, unlike many other persistence layers, features a two-level cache architecture. This design separates the transaction-scoped /session cache/ (which does /not/ require middle-tier locking and delegates concurrency concerns to the database) from the process or cluster scoped /second-level cache/ (which /may/ require middle-tier locks). So when the second-level cache is disabled for a particular class, no middle-tier lock is required. Hence, in this case, the scalability of Hibernate is limited only by the scalability of the underlying database. Our design also allows us to consider other, more sophisticated approaches to ensuring consistency between the second-level cache and database - approaches that do not require the use of middle-tier locking. I'll keep this stuff secret for now; it is an active area of investigation!

5 comments:
 
28. Jan 2004, 01:44 CET | Link
Chris Knoll
I'm not sure if it's a valid argument to say that Locking in the middle tier 'subverts the concurrency control of the underlying database.'...You make it sound like doing so makes the underlying database locking pointless (it does not!). When it makes sense, your Middle Tier can rely solely on the locking mechanisms of the underlying DB (in fact, I believe most EJB containers do this through specialized connectors).

The second point I think is even worse: 'Other 'non-java' clients that share the same database do not understand the locks'. What if I'm some ubergeek that likes to look at data using a Hex editor, and instead of going through the normal database server protocols i just directly access the raw filesystem or memory and modify the data directly? Who needs all this database mumbojumbo overhead when I can just modify the bits directly? it's so fast! Did I just invalidate database locking because there's a 'back door' to the physical data? I think that's exactly what you are describing with these non-java clients, and I think the moral of the story is that clients should always go through the middle tier in order to access anything on the data tier....if you don't, where do you put your security? if you put it in the middle tier, anyone who directly accesses the data tier will not be authenticated. (But this point is diverging from the topic at hand: locks in the middle tier). If you put it in your data tier, then you have a problem arise of having different security schemas for different entities in the data tier...seems kinda messy.

Anyways, I am not saying your articles are without merit at all (I complement you on your knowledge, I am sure I am far inferior to your experience, I'm just posting my impressions), and I'm suddenly very interested in Hibernate...Distributed caches and locks has always baffled me how to implement properly so I'll defintely have to pick up the book and take a look.


ReplyQuote
 
28. Jan 2004, 02:06 CET | Link
Christian
Well, security, concurrency control and data integrity is ultimately not the job of any application middle tier (though routines and constraints might be duplicated there), but of a database management system.

Databases are centralized, ensuring your data can be shared by many applications. In other words: Your data and database will be around when no one talks about your application middle tier anymore (i.e., in 4 years).

 
28. Jan 2004, 02:41 CET | Link
In my experience it has often been the database and context switches to it that turns out to be the bottleneck since it is typically on one machine and the Java application has to make a remote request to it. One could use something expensive and difficult to configure such as Oracle RAC to scale out but in many situations VM-level caching in the O/R layer is a much easier and cheaper solution. Of course there is the issue of stale data but this can be partially mitigated by versioning, cache invalidation and (if possible) a period where data can be out of sync. I agree if the possibility of stale data or cache invalidation is not an option one has to revert directly to the database. In the other case, though, particularly for web session related data that is pinned to a VM in a cluster, a read/write cluster cache will probably scale better. For that type of data, the cache can have a lock manager that allows multiple reads but a single write. Cache nodes should be allowed to maintain read or write locks after they complete a transaction and only give them up to the lock manager when requested. This allows a node to avoid a network request to the lock manager for multiple reads and writes which is important for web session information since mutliple operations will typically be performed by one VM due to session affinity.

For data that is not web session based, there are definitely other caching strategies that are better. However, replication to all nodes in a cluster will not scale well for session based data since data that changes will be needlessly replicated (session affinity ties it to a single VM).


The other thing this strategy buys is the ability to perform deadlock detection so request threads in the "middle tier" don't indefinitely hang.


 
01. Feb 2004, 01:50 CET | Link
Carl Rosenberger | carl(AT)db4o.com
Excellent stuff!
There is just one problem with your way of thinking, Gavin: What about portability between database engines? If I have a mid-sized standard-software-development-company (for building-engineering, for instance) and if I want my package to be runnable on multiple databases (MySQL, Oracle, MSSQL, DB2, Sybase) what do I do? If I want my application to behave exactly the same way on all database engines, the only option I have is to do everything myself, with my own locks, in my own Java tier. It's a no-op to write adapters for all databases to attempt to take advantages of the special features.
...at least that was our conclusion, after trying for four years, it was just too expensive for us. Just the research, to get to know the individual features of all engines, was a tremendous amount of work.

Well, maybe things are different today with the availability of excellent O-R mappers like Hibernate (but I doubt it).

In comparison to my past work, I really love what I am doing now, since I hardly ever have to ship around other peoples bugs and around compatibility trade-offs.
...and I am back at designing my own locks and there are no worries anymore, whether it's my job or not. :-)
 
10. Mar 2007, 01:58 CET | Link
I'm agree with you but, If you want to forbidden access to a row when an user is editing it, you need to cancel accesses to this row. In DBMS concurrency, this concurrent access are managed as transactions to be done sequentially but they are not canceled.

If an user is editing some information of a row (in a java client interface, for example) and some other user try to edit it, a message should appear ('Data is beeing editing by other user', o similar) and cancel this action. The second user have to wait until they can edit this row (When the first user finish editing it).

In this case, I think it's necessary to block rows in the middle tier (EJB, for example). It's my opinion and I'm sure it's not the best :)



Santi
Post Comment