Abstract
Note: The db usage in my case may not be typical.
H2 [Prefered Choice ***]
Installation and Deployment: is very simple and is well explained in documentation.
Speed: in my case slower than MySQL but after some tweaking became comparative.
Adminstration: has nice administration tools (not extensive as the ones of MySQL and SQL Server, but sufficient for most of the tasks).
Support: very nice and responsive user group
Note: H2 has become my prefered choice (by far) for most of the applications where db could fit in the memory.
HSQLDB
did not work well for me:
administration tools are not user friendly
loading data is rather ackward
MySQL
Very nice db, but deployment is rather heavy
Details
I was considering HSQLDB as my primary choice of in memory database. After reading its history on http://en.wikipedia.org/wiki/HSQLDB :
HSQLDB is a relational database management system written in Java. It is based on Thomas Mueller's discontinued Hypersonic SQL Project.[1] He later developed H2 as a complete rewrite.
I decided to give H2 a try. Bellow is the benchmark from the H2's website, which looks quite nice. Hopefully my application will achieve similar results.
My application. I am currently running my algorithm on the top of Taste recommender engine using MySQL. A complete set of experiments for my algorithm and existing to produce results takes 3 days on a Core2 Duo machine. I need to try more than 20 different settings = 60 days. So I decided to try to speed it up. In addition I don't have permission to install db on the supercomputing cluster at my university so in memory db hopefully will solve both of my problems.
Reasons for In-Memory Database
Performance
Since memory's I/O is much faster than disk I/O this should translate into significant speed up (assuming your db can fit in memory)
Deployment
Could be run in embeded mode -- all you need is java and you can run it locally inside of your application (no additional sotware needs to be installed on the server etc.).
Performance
Note: I am using this setup for explorative learning algorithm; it is not a typical usage of the recommender system.
MySQL Taste
Single pass: 138 - 3,819 ms (depending on configuration)
Taste Movie Lens Notes
DB
# Create Table
CREATE TABLE taste_preferences (
user_id VARCHAR(10) NOT NULL,
item_id VARCHAR(10) NOT NULL,
preference FLOAT NOT NULL,
time_stamp VARCHAR(10),
PRIMARY KEY (user_id, item_id)
)
# Create Indexes
CREATE INDEX IDX_USER_ID ON TASTE_PREFERENCES ( USER_ID );
CREATE INDEX IDX_ITEM_ID ON TASTE_PREFERENCES ( ITEM_ID );
CREATE INDEX IDX_PREFERENCE ON TASTE_PREFERENCES ( PREFERENCE );
CREATE UNIQUE INDEX IDX_PK_USER_ITEM ON TASTE_PREFERENCES (USER_ID,ITEM_ID);
# Load data from file
INSERT INTO TASTE_PREFERENCES SELECT * FROM CSVREAD('/home/neil/tmp/u.txt');
see AL_CF.java for more details
Java
Some parts of code are not well documented; the javadoc seems to be out of sync. Had to browse for this one for a while, untill found it.
// Create DB Connection
// H2
String driverName = "org.h2.Driver";
String url = "jdbc:h2:~/test";
String user = "sa";
String pwd = "";
org.h2.jdbcx.JdbcDataSource db = new org.h2.jdbcx.JdbcDataSource();
db.setUser(user);
db.setPassword(pwd);
db.setURL(url);
Exception
java.util.NoSuchElementException: Can't retrieve more due to exception: org.h2.jdbc.JdbcSQLException: The result set is not scrollable and can not be reset. You may need to use conn.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY). [90128-66]
Changed com.planetj.taste.impl.model.jdbc.AbstractJDBCDataModel
* NEIL -- Modified ResultSetUserIterator to fix the following Exception:
* TODO: Do it in a better way
Exception
org.h2.jdbc.JdbcSQLException: Syntax error in SQL statement INSERT INTO TASTE_PREFERENCES SET[*] USER_ID=?, ITEM_ID=?, PREFERENCE=? ON DUPLICATE KEY UPDATE PREFERENCE=? ; expected ., (, DEFAULT, VALUES, (, SELECT, FROM; SQL statement:
INSERT INTO taste_preferences SET user_id=?, item_id=?, preference=? ON DUPLICATE KEY UPDATE preference=? [42001-66]
at org.h2.message.Message.getSQLException(Message.java:89)
at org.h2.message.Message.getSQLException(Message.java:93)
at org.h2.message.Message.getSyntaxError(Message.java:103)
at org.h2.command.Parser.getSyntaxError(Parser.java:454)
at org.h2.command.Parser.parseSelectSimple(Parser.java:1445)
at org.h2.command.Parser.parseSelectSub(Parser.java:1366)
at org.h2.command.Parser.parseSelectUnion(Parser.java:1249)
at org.h2.command.Parser.parseSelect(Parser.java:1237)
at org.h2.command.Parser.parseInsert(Parser.java:826)
at org.h2.command.Parser.parsePrepared(Parser.java:343)
at org.h2.command.Parser.parse(Parser.java:265)
at org.h2.command.Parser.parse(Parser.java:241)
at org.h2.command.Parser.prepareCommand(Parser.java:209)
at org.h2.engine.Session.prepareLocal(Session.java:213)
at org.h2.engine.Session.prepareCommand(Session.java:195)
at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:970)
at org.h2.jdbc.JdbcPreparedStatement.<init>(JdbcPreparedStatement.java:1206)
at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:161)
at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.prepareStatement(PoolingDataSource.java:302)
at com.planetj.taste.impl.model.jdbc.AbstractJDBCDataModel.setPreference(AbstractJDBCDataModel.java:420)
at com.planetj.taste.impl.recommender.AbstractRecommender.setPreference(AbstractRecommender.java:81)
at org.hrstc.taste.al.AL_CF.evaluate(AL_CF.java:243)
at org.hrstc.taste.al.AL_CF.main(AL_CF.java:80)
Solution:
INSERT INTO TASTE_PREFERENCES SET USER_ID='22', ITEM_ID='378', PREFERENCE='5.0' ON DUPLICATE KEY UPDATE PREFERENCE='5.0'
Was giving the above exception. Replaced it with:
INSERT INTO TASTE_PREFERENCES (user_id, item_id, preference) VALUES ('22', '378', '5.0') ON DUPLICATE KEY UPDATE PREFERENCE='4.0'
Then ... ON DUPLICATE KEY UPDATE PREFERENCE='4.0' part was not a standard SQL syntax (perhaps specific to MySQL).
A better way would be to use an ANSI/ISO standard command MERGE ( instead of other db specific variants ) e.g.:
MERGE INTO TASTE_PREFERENCES(user_id, item_id, preference) key(user_id,item_id) VALUES ('22', '378', '4.0')
wrote H2JDBCDataModel.java
Performance Tweaking
I finaly got H2 to run (most of the changes where migrating incompatible SQL from MySQL to standard SQL)
The performance of it was rather slow:
INFO: It took ms: 2,119,766
Used memory-only mode http://www.h2database.com/html/features.html#memory_only_databases
INFO: It took ms: 140,568
Thats a 10 fold improvement
Let JVM use more memory; let db use more memory
Did not help
Profiling
Installed TPTP for eclipse but does not work; could not figure out why
Using NetBeans to do profiling; is throwing some of the old error for some reason.
Fix: The problem with error was that class in jar was not correctly overwritten by NetBeans
For some reason when running from NetBeans userNeighbourhood.size = 0; but when running from command line the same files it is not
Fix: The problem with error was that class in jar was not correctly overwritten by NetBeans
Performance
Surprisingly the peformance of MySQL MyISAM and MEMORY engine are almost identical.
Performance of H2 is 10x slower than MySQL. The difference of code between two implementations is that for
H2: MERGE INTO
MySQL: INSERT ... ON DUPLICATE KEY UPDATE
MySQL
MyISAM Engine
INFO: It took ms: 20,574
Mar 24, 2008 11:05:09 AM org.hrstc.taste.al.AL_CF <init>
INFO: log running..
943
Mar 24, 2008 11:05:11 AM org.hrstc.taste.al.AL_CF getTestUsers
INFO: Selected userID: 421
Mar 24, 2008 11:05:11 AM org.hrstc.taste.al.AL_CF evaluate
INFO: AL Type: random
Mar 24, 2008 11:05:13 AM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}]
Mar 24, 2008 11:05:13 AM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 1577
Mar 24, 2008 11:05:34 AM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}, {MAE=0.8614681}]
Mar 24, 2008 11:05:34 AM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 20574
CREATE TABLE `taste`.`taste_preferences` (
`user_id` varchar(10) NOT NULL,
`item_id` varchar(10) NOT NULL,
`preference` float NOT NULL,
PRIMARY KEY (`user_id`,`item_id`),
KEY `user_id` (`user_id`),
KEY `item_id` (`item_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
Memory Engine
INFO: It took ms: 19,781
Mar 24, 2008 2:55:58 PM org.hrstc.taste.al.AL_CF <init>
INFO: log running..
943
Mar 24, 2008 2:56:00 PM org.hrstc.taste.al.AL_CF getTestUsers
INFO: Selected userID: 421
Mar 24, 2008 2:56:00 PM org.hrstc.taste.al.AL_CF evaluate
INFO: AL Type: random
Mar 24, 2008 2:56:02 PM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}]
Mar 24, 2008 2:56:02 PM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 1522
Mar 24, 2008 2:56:22 PM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}, {MAE=0.760892}]
Mar 24, 2008 2:56:22 PM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 19781
Mar 24, 2008 2:56:38 PM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}, {MAE=0.760892}, {MAE=0.86437464}]
Mar 24, 2008 2:56:38 PM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 16243
CREATE TABLE `taste`.`taste_preferences` (
`user_id` varchar(10) NOT NULL,
`item_id` varchar(10) NOT NULL,
`preference` float NOT NULL,
PRIMARY KEY (`user_id`,`item_id`),
KEY `user_id` (`user_id`),
KEY `item_id` (`item_id`)
) ENGINE=MEMORY DEFAULT CHARSET=latin1
H2
INFO: It took ms: 214,899
Mar 24, 2008 3:01:21 PM org.hrstc.taste.al.AL_CF <init>
INFO: loaded data
943
Mar 24, 2008 3:01:23 PM org.hrstc.taste.al.AL_CF getTestUsers
INFO: Selected userID: 421
Mar 24, 2008 3:01:23 PM org.hrstc.taste.al.AL_CF evaluate
INFO: AL Type: random
Mar 24, 2008 3:01:24 PM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}]
Mar 24, 2008 3:01:24 PM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 1865
Mar 24, 2008 3:04:59 PM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}, {MAE=0.7894972}]
Mar 24, 2008 3:04:59 PM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 214899
HSSQLDB
Starting DB: java -cp ./lib/hsqldb.jar org.hsqldb.Server -database.0 file:mydb -dbname.0 xdb
java -cp ./lib/hsqldb.jar org.hsqldb.util.DatabaseManager
Loading Data
keywords: hsqldb load data file csv
Use Text Table ; also see src/org/hsqldb/sample/load_binding_lu.sql
Error
Error: table not found in statement [SET TABLE SOURCE] / Error Code: -22 / State: S0002
Possible Reason: Text Tables cannot be created in memory-only databases (databases that have no script file).
Tried example from src/org/hsqldb/sample/load_binding_lu.sql
CREATE TEXT TABLE binding_tmptxt (
id integer,
name varchar(12)
);
Error: Database is memory only in statement [CREATE TEXT TABLE binding_tmptxt] / Error Code: -63 / State: S1000
Solution: this statement is not allowed for in memory database; start server db instance
Error
When trying sudo a server [org.hsqldb.util.DatabaseManager] get the following error:
java.sql.SQLException: socket creation error
Solution: none
Various
x
Performance of H2 is 10x slower than MySQL. The difference of code between two implementations is that for
H2: MERGE INTO
MySQL: INSERT ... ON DUPLICATE KEY UPDATE
Changed it but did not make a difference.
Disabled/Enabled Connection pooling but did not produce significant affect either.
Taste Recommender Code Improvement
Since performance still is not good rewrote code for the methods that take too long; and wrote new classes optimized for my task as seen in the next blog posting ....
ToDo
Tracing
http://www.h2database.com/html/features.html#trace_options
TRACE_LEVEL_SYSTEM_OUT=3
Do also java code generation; this way can benchmark it against MySQL and see where the problem is.
Turn on logging finest
See performance for my AL method and optimize for it - since it is the slowest one anyway.
x
If performance still is not good may need to rewrite code for the methods that take too long; or write new classes optimized for my task
For example: may compute user neighborhood only for the specific items; since I need to get estimates for only a few items, etc.
Possible Alternatives
HSQLDB memory mode
MySQL's MEMORY (HEAP) Storage Engine http://dev.mysql.com/doc/refman/5.0/en/memory-storage-engine.html