Note: The db usage in my case may not be typical.
did not work well for me:
administration tools are not user friendly
could not quite figure out how to load data from a file
Nice adminstration tools
Deployment is very simple and well explained in documentation
Speed: in my case slower than MySQL
* Preferered choice in my case
Very nice db, but deployment is rather heavy
I was considering HSQLDB as my primary choice of in memory database. After reading its history on http://en.wikipedia.org/wiki/HSQLDB :
HSQLDB is a relational database management system written in Java. It is based on Thomas Mueller's discontinued Hypersonic SQL Project.[1] He later developed H2 as a complete rewrite.
I decided to give H2 a try. Bellow is the benchmark from the H2's website, which looks quite nice. Hopefully my application will achieve similar results.
My application. I am currently running my algorithm on the top of Taste recommender engine using MySQL. A complete set of experiments for my algorithm and existing to produce results takes 3 days on a Core2 Duo machine. I need to try more than 20 different settings = 60 days. So I decided to try to speed it up. In addition I don't have permission to install db on the supercomputing cluster at my university so in memory db hopefully will solve both of my problems.
Since memory's I/O is much faster than disk I/O this should translate into significant speed up (assuming your db can fit in memory)
Could be run in embeded mode -- all you need is java and you can run it locally inside of your application (no additional sotware needs to be installed on the server etc.).
Note: I am using this setup for explorative learning algorithm; it is not a typical usage of the recommender system.
Single pass: 138 - 3,819 ms (depending on configuration)
# Create Table
CREATE TABLE taste_preferences (
user_id VARCHAR(10) NOT NULL,
item_id VARCHAR(10) NOT NULL,
preference FLOAT NOT NULL,
time_stamp VARCHAR(10),
PRIMARY KEY (user_id, item_id)
)
# Create Indexes
CREATE INDEX IDX_USER_ID ON TASTE_PREFERENCES ( USER_ID )
CREATE INDEX IDX_ITEM_ID ON TASTE_PREFERENCES ( ITEM_ID )
CREATE INDEX IDX_PREFERENCE ON TASTE_PREFERENCES ( PREFERENCE )
# Load data from file
INSERT INTO TASTE_PREFERENCES SELECT * FROM CSVREAD('/home/neil/tmp/u.txt');
see AL_CF.java for more details
Some parts of code are not well documented; the javadoc seems to be out of sync. Had to browse for this one for a while, untill found it.
// Create DB Connection
// H2
String driverName = "org.h2.Driver";
String url = "jdbc:h2:~/test";
String user = "sa";
String pwd = "";
org.h2.jdbcx.JdbcDataSource db = new org.h2.jdbcx.JdbcDataSource();
db.setUser(user);
db.setPassword(pwd);
db.setURL(url);
java.util.NoSuchElementException: Can't retrieve more due to exception: org.h2.jdbc.JdbcSQLException: The result set is not scrollable and can not be reset. You may need to use conn.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY). [90128-66]
Changed com.planetj.taste.impl.model.jdbc.AbstractJDBCDataModel
* NEIL -- Modified ResultSetUserIterator to fix the following Exception:
* TODO: Do it in a better way
org.h2.jdbc.JdbcSQLException: Syntax error in SQL statement INSERT INTO TASTE_PREFERENCES SET[*] USER_ID=?, ITEM_ID=?, PREFERENCE=? ON DUPLICATE KEY UPDATE PREFERENCE=? ; expected ., (, DEFAULT, VALUES, (, SELECT, FROM; SQL statement:
INSERT INTO taste_preferences SET user_id=?, item_id=?, preference=? ON DUPLICATE KEY UPDATE preference=? [42001-66]
at org.h2.message.Message.getSQLException(Message.java:89)
at org.h2.message.Message.getSQLException(Message.java:93)
at org.h2.message.Message.getSyntaxError(Message.java:103)
at org.h2.command.Parser.getSyntaxError(Parser.java:454)
at org.h2.command.Parser.parseSelectSimple(Parser.java:1445)
at org.h2.command.Parser.parseSelectSub(Parser.java:1366)
at org.h2.command.Parser.parseSelectUnion(Parser.java:1249)
at org.h2.command.Parser.parseSelect(Parser.java:1237)
at org.h2.command.Parser.parseInsert(Parser.java:826)
at org.h2.command.Parser.parsePrepared(Parser.java:343)
at org.h2.command.Parser.parse(Parser.java:265)
at org.h2.command.Parser.parse(Parser.java:241)
at org.h2.command.Parser.prepareCommand(Parser.java:209)
at org.h2.engine.Session.prepareLocal(Session.java:213)
at org.h2.engine.Session.prepareCommand(Session.java:195)
at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:970)
at org.h2.jdbc.JdbcPreparedStatement.<init>(JdbcPreparedStatement.java:1206)
at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:161)
at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.prepareStatement(PoolingDataSource.java:302)
at com.planetj.taste.impl.model.jdbc.AbstractJDBCDataModel.setPreference(AbstractJDBCDataModel.java:420)
at com.planetj.taste.impl.recommender.AbstractRecommender.setPreference(AbstractRecommender.java:81)
at org.hrstc.taste.al.AL_CF.evaluate(AL_CF.java:243)
at org.hrstc.taste.al.AL_CF.main(AL_CF.java:80)
Solution:
INSERT INTO TASTE_PREFERENCES SET USER_ID='22', ITEM_ID='378', PREFERENCE='5.0' ON DUPLICATE KEY UPDATE PREFERENCE='5.0'
Was giving the above exception. Replaced it with:
INSERT INTO TASTE_PREFERENCES (user_id, item_id, preference) VALUES ('22', '378', '5.0') ON DUPLICATE KEY UPDATE PREFERENCE='4.0'
Then ... ON DUPLICATE KEY UPDATE PREFERENCE='4.0' part was not a standard SQL syntax (perhaps specific to MySQL).
A better way would be to use an ANSI/ISO standard command MERGE ( instead of other db specific variants ) e.g.:
MERGE INTO TASTE_PREFERENCES(user_id, item_id, preference) key(user_id,item_id) VALUES ('22', '378', '4.0')
wrote H2JDBCDataModel.java
I finaly got H2 to run (most of the changes where migrating incompatible SQL from MySQL to standard SQL)
The performance of it was rather slow:
INFO: It took ms: 2,119,766
Used memory-only mode http://www.h2database.com/html/features.html#memory_only_databases
INFO: It took ms: 140,568
Thats a 10 fold improvement
Let JVM use more memory; let db use more memory
Did not help
Installed TPTP for eclipse but does not work; could not figure out why
Using NetBeans to do profiling; is throwing some of the old error for some reason.
Fix: The problem with error was that class in jar was not correctly overwritten by NetBeans
For some reason when running from NetBeans userNeighbourhood.size = 0; but when running from command line the same files it is not
Fix: The problem with error was that class in jar was not correctly overwritten by NetBeans
Surprisingly the peformance of MySQL MyISAM and MEMORY engine are almost identical.
Performance of H2 is 10x slower than MySQL. The difference of code between two implementations is that for
H2: MERGE INTO
MySQL: INSERT ... ON DUPLICATE KEY UPDATE
INFO: It took ms: 20,574
Mar 24, 2008 11:05:09 AM org.hrstc.taste.al.AL_CF <init>
INFO: log running..
943
Mar 24, 2008 11:05:11 AM org.hrstc.taste.al.AL_CF getTestUsers
INFO: Selected userID: 421
Mar 24, 2008 11:05:11 AM org.hrstc.taste.al.AL_CF evaluate
INFO: AL Type: random
Mar 24, 2008 11:05:13 AM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}]
Mar 24, 2008 11:05:13 AM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 1577
Mar 24, 2008 11:05:34 AM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}, {MAE=0.8614681}]
Mar 24, 2008 11:05:34 AM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 20574
CREATE TABLE `taste`.`taste_preferences` (
`user_id` varchar(10) NOT NULL,
`item_id` varchar(10) NOT NULL,
`preference` float NOT NULL,
PRIMARY KEY (`user_id`,`item_id`),
KEY `user_id` (`user_id`),
KEY `item_id` (`item_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
INFO: It took ms: 19,781
Mar 24, 2008 2:55:58 PM org.hrstc.taste.al.AL_CF <init>
INFO: log running..
943
Mar 24, 2008 2:56:00 PM org.hrstc.taste.al.AL_CF getTestUsers
INFO: Selected userID: 421
Mar 24, 2008 2:56:00 PM org.hrstc.taste.al.AL_CF evaluate
INFO: AL Type: random
Mar 24, 2008 2:56:02 PM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}]
Mar 24, 2008 2:56:02 PM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 1522
Mar 24, 2008 2:56:22 PM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}, {MAE=0.760892}]
Mar 24, 2008 2:56:22 PM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 19781
Mar 24, 2008 2:56:38 PM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}, {MAE=0.760892}, {MAE=0.86437464}]
Mar 24, 2008 2:56:38 PM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 16243
CREATE TABLE `taste`.`taste_preferences` (
`user_id` varchar(10) NOT NULL,
`item_id` varchar(10) NOT NULL,
`preference` float NOT NULL,
PRIMARY KEY (`user_id`,`item_id`),
KEY `user_id` (`user_id`),
KEY `item_id` (`item_id`)
) ENGINE=MEMORY DEFAULT CHARSET=latin1
INFO: It took ms: 214,899
Mar 24, 2008 3:01:21 PM org.hrstc.taste.al.AL_CF <init>
INFO: loaded data
943
Mar 24, 2008 3:01:23 PM org.hrstc.taste.al.AL_CF getTestUsers
INFO: Selected userID: 421
Mar 24, 2008 3:01:23 PM org.hrstc.taste.al.AL_CF evaluate
INFO: AL Type: random
Mar 24, 2008 3:01:24 PM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}]
Mar 24, 2008 3:01:24 PM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 1865
Mar 24, 2008 3:04:59 PM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}, {MAE=0.7894972}]
Mar 24, 2008 3:04:59 PM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 214899
Starting DB: java -cp ./lib/hsqldb.jar org.hsqldb.Server -database.0 file:mydb -dbname.0 xdb
java -cp ./lib/hsqldb.jar org.hsqldb.util.DatabaseManager
keywords: hsqldb load data file csv
Use Text Table ; also see src/org/hsqldb/sample/load_binding_lu.sql
Error: table not found in statement [SET TABLE SOURCE] / Error Code: -22 / State: S0002
Possible Reason: Text Tables cannot be created in memory-only databases (databases that have no script file).
Tried example from src/org/hsqldb/sample/load_binding_lu.sql
CREATE TEXT TABLE binding_tmptxt (
id integer,
name varchar(12)
);
Error: Database is memory only in statement [CREATE TEXT TABLE binding_tmptxt] / Error Code: -63 / State: S1000
Solution: this statement is not allowed for in memory database; start server db instance
When trying sudo a server [org.hsqldb.util.DatabaseManager] get the following error:
java.sql.SQLException: socket creation error
Solution: none
Performance of H2 is 10x slower than MySQL. The difference of code between two implementations is that for
H2: MERGE INTO
MySQL: INSERT ... ON DUPLICATE KEY UPDATE
Changed it but did not make a difference.
Disabled/Enabled Connection pooling but did not produce significant affect either.
Since performance still is not good rewrote code for the methods that take too long; and wrote new classes optimized for my task as seen in the next blog posting ....
http://www.h2database.com/html/features.html#trace_options
TRACE_LEVEL_SYSTEM_OUT=3
Do also java code generation; this way can benchmark it against MySQL and see where the problem is.
Turn on logging finest
See performance for my AL method and optimize for it - since it is the slowest one anyway.
If performance still is not good may need to rewrite code for the methods that take too long; or write new classes optimized for my task
For example: may compute user neighborhood only for the specific items; since I need to get estimates for only a few items, etc.
HSQLDB memory mode
MySQL's MEMORY (HEAP) Storage Engine http://dev.mysql.com/doc/refman/5.0/en/memory-storage-engine.html