I'm working in the Hibernate and Infinispan teams at JBoss, caring about Lucene integration in products we support, striving to make it easier to use and to integrate in well known APIs and patterns, and finally to make it scale better; I love clean and well performing code.
I've been an early adopter of cloud deployments scaling Lucene to a huge number of requests on EC2 using Hibernate Search, and after that I worked with Sourcesense to make JIRA clusterable via Infinispan. Have been trainer on Seam and Hibernate courses.
| Recent Entries |
|
22. Apr 2013
|
|
|
30. Jan 2013
|
|
|
18. Oct 2012
|
|
|
21. Jun 2012
|
|
|
08. Jun 2012
|
|
|
23. May 2012
|
|
|
09. May 2012
|
|
|
29. Mar 2012
|
|
|
15. Feb 2012
|
|
|
06. Feb 2012
|
|
|
18. Jan 2012
|
|
|
06. Dec 2011
|
|
|
05. Sep 2011
|
|
|
26. Aug 2011
|
|
|
18. Aug 2011
|
The Hibernate Search 4.3 iteration reached its first milestone: version 4.3.0.Alpha1 is now available for download from Sourceforge.net and Maven repositories.
The theme for the 4.3 development cycle is clustering: we want to make it better, faster and easier to setup multiple nodes using Hibernate Search in both traditional bare-metal clusters and clouds. For now we're focusing on JGroups and Infinispan integrations but other contributions in the area are very welcome.
JGroups backend
Besides some minor bugfixes and improved logging messages, the big news is automatic master election.
Rather than having to setup some jgroupsSlave instances and a single jgroupsMaster instance with different configurations, you can now simply specify jgroups as backend on all your instances and they will elect a single master. The main benefit of this new feature is that when a master fails, it can automatically elect a new one; beware though the failover approach is still experimental and it won't - for one - cleanup stale locks the dead master could have left behind.
### backend configuration hibernate.search.default.worker.backend = jgroups
Also we introduced some more configuration options for the power user: see the reference documentation for all details.
Updated JBoss modules
The JBoss Modules where updated to match JBoss EAP 6.1 and JBoss AS 7.2 (now renamed WildFly), and now also include the Infinispan Directory for easy usage of Infinispan when deploying in the application server.
The modules are availalble as a zip in Maven repositories, or can be downloaded from Sourceforge.net; more on how to use it is described in this section of the documentation.
Components upgraded
Many dependencies where upgraded, and integration points are now expecting the following versions:
- JBoss EAP 6.1
- Hibernate ORM 4.2.x
- JGroups 3.2.x
- Infinispan 5.2.x
- Lucene 3.6.x (this didn't actually change compared to Hibernate Search 4.2 but it's good to remind the version)
What's next?
In the next few weeks we will be working on better Infinispan integration: easier setup and more configuration examples. Also we have some open tasks about Spatial queries, especially the API needs some polishing.
The complete list of changes can be found on the JIRA release notes.
Links recap:
- Download it from Sourceforge or via Maven
- Get in touch on the forums or on the mailing list, or join us for a chat on IRC
- The issue tracker is JIRA and all code is on GitHub
Past week I returned from my trip to Bengaluru, where we had one of our great developers conferences.
JUDCon India 2013
As always at these events the best part was the people attending: a good mix of new users and experts, but all having in common a very healthy curiosity and not intimidated at all, so proposing a terrific amount of questions, discussions and for my long trip home a lot of things to think about.
Presentations
I had the honor to present several topics:
- Hibernate Search: queries for Hibernate and Infinispan
- Infinispan in 50 minutes
- Cross data center replication with Infinispan
- Measuring performance and capacity planning for Data Grids
- Participating on the JBoss experts panel
The talk about Hibernate Search was a last minute addition: by shuffling the agenda a bit we could insert the additional subject and given the amount of nice feedback I'm happy we did.
The big denormalization problem
An expert Hibernate Search user asked me what would happen when having a domain model connecting User types to Addresses, when you have many Users and the city name changes. He actually knew what would happen, but was looking for alternatives to compensate for the problem; since Lucene requires denormalization, all User instances in the Lucene index need to be updated, triggering a reload of all Users living in the particular city. Yes that might be a problem! But that is not something happening frequently in model schemas right? I stated that in this example, it would take a city to change name! Well that caused a good amount of laugher as Bangalore just changed it's official name to the old traditional Bengaluru.. so since they where using Hibernate Search and this was an unexpected behaviour when the city changed name - having more than 8 million inhabitants - the public registry had some servers working very hard!
Obviously this needed specific testing and possibly better warnings from out part. Such problems are a natural consequence of denormalization and need to be addressed with ad-hoc solutions; in this case I'd suggest using a synonym and register the two names as same
in the context of searching by configuring the Synonym support in the used Analyzer: the city name would need a single record change in the database and no reindexing would be needed.
Hibernate OGM
While I'm part of the OGM team, I had no need to talk about OGM as well because there where other speakers on the subject already. I greatly enjoyed listening to the other presentors, Ramya Subash and Shekhar Gulati: they where extremely well prepared and even with the most complex questions there was no need for me to help out.
To all attending and especially all those I've been talking to, thank you so much it was very interesting and I very much appreciate all the feedback. As always feel free to get more questions flowing on our Hibernate forums or Infinispan forum, and you're all welcome to participate more by sending tests or patches.
The latest Hibernate Search beta v. 4.2.0.Beta2 is available!
In this iteration we introduce Apache Tika integration, Spatial Queries are now able to sort on distance, and as usual a list of less noticeable improvements.
Apache Tika integration
Apache Tika allows you to extract text and index any kind of documents, like MP3 metadata, PDF text, office files. You can annotate a Blob field if loading the media files from a database, or have the String field point to a resource or file path.
@Entity
@Indexed
public class Book {
Integer id;
Blob content;
@Id @GeneratedValue
public Integer getId() {
return id;
}
public void setId(Integer id) {
this.id = id;
}
@Lob @Basic(fetch = FetchType.LAZY)
@Field @TikaBridge // <- just add the TikaBridge as an adaptor to make the Blob indexable as any
public Blob getContent() {
return content;
}
public void setContent(Blob content) {
this.content = content;
}
}
The @TikaBridge annotation supports more options to tune the kind of text extraction; refer to the documentation for more details. Consider this feature experimental for now: we didn't add an option to make the text extraction asynchronous yet, so we might need to change the API to introduce that.
Spatial Queries sorted by distance
Thanks to all of Nicolas's Helleringer work, it's now easy to
- Return the distance from the search center to each hit (via a projection)
- Apply a sort criteria on the distance
Let's see an example from our large collection of self-documenting examples (the testsuite!):
QueryBuilder builder = em.getSearchFactory().buildQueryBuilder().forEntity( Cafe.class ).get();
org.apache.lucene.search.Query luceneQuery = builder.spatial()
.onCoordinates( "location" )
.within( 100, Unit.KM )
.ofLatitude( centerLatitude )
.andLongitude( centerLongitude )
.createQuery();
FullTextQuery hibQuery = em.createFullTextQuery( luceneQuery, Cafe.class );
Sort distanceSort = new Sort( new DistanceSortField( centerLatitude, centerLongitude, "location" ) );
hibQuery.setSort( distanceSort );
hibQuery.setProjection( FullTextQuery.THIS, FullTextQuery.SPATIAL_DISTANCE );
hibQuery.setSpatialParameters( centerLatitude, centerLongitude, "location" );
List results = hibQuery.getResultList();
Several more reasons to upgrade
- Apache Lucene upgraded to version 3.6.1
- JMS and JMX integrations improved
- The MassIndexer now correctly applies EntityIndexingInterceptor
- Lower memory usage
- Spatial Queries improved
- Improved some classloaders for better integration with other libraries
The complete list of changes can be found here. Check the Migration Guide.
It has been a while since 4.2.0.Beta1 but the summer is over, so try these quickly as we'll move to the Final soon! As always, feedback is very welcome.
The usual links:
- Download it from Sourceforge or via Maven artifacts
- Get in touch on the forums or on the mailing list, or join us for a chat on IRC
- Get the spotlight in the next release: have a look at JIRA and get the code from GitHub
Spatial Queries, aka GET COFFEE NOW!
This release introduces support for Spatial Queries, a superb feature, especially when combined with traditional full-text.
As usual, we strive to boost your productivity. With Spatial Queries you can express the equivalent of:
Find me a coffee shop SHOULD In [2 miles] radius from [location] MUST NOT Starbucks
Enjoy! Of course that's fake meta-language, the actual syntax would require to use the new Spatial DSL; all details and examples are in the new Spatial chapter of the documentation.
Big thanks to Nicolas Helleringer, who coded it all, wrote the documentation and endured all our criticism for many months of work. Now it's your turn to try it out, comment, and maybe improve on it?
Lucene 3.6
Apache Lucene 3.6 was released, and once more we stay up to date. Nothing changed in the API we expose, once more protecting you from the changes in Lucene code, but it's worth considering to reindex or at least use the compatibility options.
More updates
As usual it includes some minor improvements, for a detailed list see the changelogs on JIRA and keep an eye on the Migration Guide.
Download Hibernate Search 4.2.0.Beta1
- Download it from Sourceforge or via Maven artifacts.
Forums
- The user forums are the place to get on how to use it, where you can help other users, or search for past questions.
Want to help?
We maintain a list of low hanging fruits
which should be easy starting points to get anyone started to improve Hibernate Search: have a look into issues suited for new contributors (maybe), the wiki pages and of course feel free to get in touch if you're stuck.
Berlin Buzzwords Barcamp
The main conference was introduced by a barcamp event on Sunday afternoon and night, in a fascinating location!
c-base
The barcamp was at c-base, which I initially had mistaken for a creative design company or an underground disco. Kosch, an Infinispan user and contributor, welcomed me with a nice glass of mead and corrected my blind guess: it actually is a massive space ship being built by hackers in the underground of the city. This place pours with hard core hackers culture, staffs 400 members in this huge place, full of self made droids, LDAP-verifying doors, advanced equipment all over up to self made 3d printers, scanners, arcade video games and of course connections to The Matrix.
meetings and discussions
There were a lot of people from the Apache communities, I have been talking almost all the evening with Lucene developers, but also listened to experiences people had with HBase, Cassandra, MongoDB, Solr, ElasticSearch, and of course our very own Hibernate Search, Infinispan and JBoss AS.
A recurring subject was the need to use multiple of these datastores in a better integrated way, mostly it was about integrating {bigdataX} with Lucene.
SQL vs. NoSQL
So this place was packed with NoSQL zealots. You can imagine this strong pack, excited and a bit drunk too. Perfect timing for some members of the SQL standards to show up! They had some reasonable objections to the NoSQL
expression, most notably that all these alternative engines would need, could be standardized in the new revision of the specification. The answer from Chris Harris was hilarious: you're missing the point
.
The Berlin Buzzwords 2012 conference
The main conference started with many interesting talks, from the keynote from colleague Leslie Hawthorn, but the buzz of the hallway track
continued very strongly for me. I've met amazing people and had interesting chats with a lot of users of our technologies. It was easy to meet a lot of known community members, I've been talking with many users I don't remember the name of, but also with Shay Banon (Elastic Search founder), Grant Ingersoll (Chief Scientist at Lucid Imagination, and well known contributor of several Apache projects), Uwe Schindler (Lucene), Robert Muir (Lucene, also Lucid Imagination), Michael Busch (Lucene, at Twitter), Nick Burch (Apache Tika, Alfresco), Christian Moen (Lucene, creator of the awesome Japanese analyzers), Karel Minarik (Ruby client for Elastic Search), Simon Willnauer (Lucene, and conference organizer), Chris Harris (MongoDB), Martijn van Groningen (JOIN implementations on Lucene) and colleagues Mircea Markus (Infinispan) and Lukáš Vlček (search.jboss.org, Elastic Search).
Updatable fields for Lucene
Andrzej Bialecki had an amazing talk on the codecs coming in Lucene 4, and explained how fields could be made update-able. There are some patches already but there is still lots of work to do, and he is inviting users to help out: LUCENE-3837.
JOINs in Lucene
Martijn van Groningen is working on JOIN functionality in Lucene, it would be very interesting if someone could experiment with support for it in Hibernate Search: such a feature is highly requested and would be very useful for Hibernate OGM too.
How is Infinispan different than key/value store X?
This was a frequent question people had to me. The main point - besides supporting transactions - is that it focuses on in-memory while still preserving high availability. It's a good idea to use it together with {your favourite other store here} for disk persistence. Why? Our tests just breached the one million operations/sec, and there is still much we can improve...
The people there
The conference was great, as it somehow managed to keep marketing low and keep the spotlight on the developers, the people, and the stuff that really matters. An example of this was that most talks were in 20 minutes slots, forcing speakers to focus very strictly on the juicy aspects and leave everything else out for face to face discussions in the halls. That worked amazingly well for me. I'm glad for all the chats I had with everyone, so thank you all!
| Showing 1 to 5 of 21 blog entries |
|
|