Reply to comment

  • : preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /homepages/5/d120552937/htdocs/neil/drupal/or/includes/unicode.inc on line 345.
  • : preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /homepages/5/d120552937/htdocs/neil/drupal/or/includes/unicode.inc on line 345.
  • : preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /homepages/5/d120552937/htdocs/neil/drupal/or/includes/unicode.inc on line 345.

[MOVED] In Memory Database

Abstract

Note: The db usage in my case may not be typical.

HSQLDB

did not work well for me: 

administration tools are not user friendly

could not quite figure out how to load data from a file

H2

Nice adminstration tools

Deployment is very simple and well explained in documentation

Speed: in my case slower than MySQL

* Preferered choice in my case

MySQL

Very nice db, but deployment is rather heavy

 

Details

I was considering HSQLDB as my primary choice of in memory database.  After reading its history on http://en.wikipedia.org/wiki/HSQLDB :

HSQLDB is a relational database management system written in Java. It is based on Thomas Mueller's discontinued Hypersonic SQL Project.[1] He later developed H2 as a complete rewrite.

I decided to give H2 a try.  Bellow is the benchmark from the H2's website, which looks quite nice.  Hopefully my application will achieve similar results.

My application.  I am currently running my algorithm on the top of Taste recommender engine using MySQL.  A complete set of experiments for my algorithm and existing  to produce results takes 3 days on a Core2 Duo machine.  I need to try more than 20 different settings = 60 days.  So I decided to try to speed it up.  In addition I don't have permission to install db on the supercomputing cluster at my university so in memory db hopefully will solve both of my problems.

Reasons for In-Memory Database

Performance

Since memory's I/O is much faster than disk I/O this should translate into significant speed up (assuming your db can fit in memory)

Deployment

Could be run in embeded mode -- all you need is java and you can run it locally inside of your application (no additional sotware needs to be installed on the server etc.).

 

Performance

Note: I am using this setup for explorative learning algorithm; it is not a typical usage of the recommender system.

MySQL Taste

Single pass: 138 - 3,819 ms (depending on configuration) 

Taste Movie Lens Notes

DB

# Create Table

 CREATE TABLE taste_preferences (
   user_id VARCHAR(10) NOT NULL,
   item_id VARCHAR(10) NOT NULL,
   preference FLOAT NOT NULL,
   time_stamp VARCHAR(10),
   PRIMARY KEY (user_id, item_id)
 )

# Create Indexes

CREATE INDEX IDX_USER_ID ON TASTE_PREFERENCES  ( USER_ID )

CREATE INDEX IDX_ITEM_ID ON TASTE_PREFERENCES  ( ITEM_ID )

CREATE INDEX IDX_PREFERENCE ON TASTE_PREFERENCES  ( PREFERENCE  )

# Load data from file

INSERT INTO TASTE_PREFERENCES  SELECT * FROM CSVREAD('/home/neil/tmp/u.txt');

see AL_CF.java for more details

Java

Some parts of code are not well documented; the javadoc seems to be out of sync.  Had to browse for this one for a while, untill found it. 

        // Create DB Connection   
        // H2
        String driverName = "org.h2.Driver";
        String url = "jdbc:h2:~/test";
        String user = "sa";
        String pwd = "";

        org.h2.jdbcx.JdbcDataSource db = new org.h2.jdbcx.JdbcDataSource();
        db.setUser(user);
        db.setPassword(pwd);
        db.setURL(url);

Exception

java.util.NoSuchElementException: Can't retrieve more due to exception: org.h2.jdbc.JdbcSQLException: The result set is not scrollable and can not be reset. You may need to use conn.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY). [90128-66]

Changed com.planetj.taste.impl.model.jdbc.AbstractJDBCDataModel
* NEIL -- Modified ResultSetUserIterator to fix the following Exception: 

* TODO: Do it in a better way 

Exception

org.h2.jdbc.JdbcSQLException: Syntax error in SQL statement INSERT INTO TASTE_PREFERENCES SET[*] USER_ID=?, ITEM_ID=?, PREFERENCE=? ON DUPLICATE KEY UPDATE PREFERENCE=? ; expected ., (, DEFAULT, VALUES, (, SELECT, FROM; SQL statement:
INSERT INTO taste_preferences SET user_id=?, item_id=?, preference=? ON DUPLICATE KEY UPDATE preference=? [42001-66]
    at org.h2.message.Message.getSQLException(Message.java:89)
    at org.h2.message.Message.getSQLException(Message.java:93)
    at org.h2.message.Message.getSyntaxError(Message.java:103)
    at org.h2.command.Parser.getSyntaxError(Parser.java:454)
    at org.h2.command.Parser.parseSelectSimple(Parser.java:1445)
    at org.h2.command.Parser.parseSelectSub(Parser.java:1366)
    at org.h2.command.Parser.parseSelectUnion(Parser.java:1249)
    at org.h2.command.Parser.parseSelect(Parser.java:1237)
    at org.h2.command.Parser.parseInsert(Parser.java:826)
    at org.h2.command.Parser.parsePrepared(Parser.java:343)
    at org.h2.command.Parser.parse(Parser.java:265)
    at org.h2.command.Parser.parse(Parser.java:241)
    at org.h2.command.Parser.prepareCommand(Parser.java:209)
    at org.h2.engine.Session.prepareLocal(Session.java:213)
    at org.h2.engine.Session.prepareCommand(Session.java:195)
    at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:970)
    at org.h2.jdbc.JdbcPreparedStatement.<init>(JdbcPreparedStatement.java:1206)
    at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:161)
    at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.prepareStatement(PoolingDataSource.java:302)
    at com.planetj.taste.impl.model.jdbc.AbstractJDBCDataModel.setPreference(AbstractJDBCDataModel.java:420)
    at com.planetj.taste.impl.recommender.AbstractRecommender.setPreference(AbstractRecommender.java:81)
    at org.hrstc.taste.al.AL_CF.evaluate(AL_CF.java:243)
    at org.hrstc.taste.al.AL_CF.main(AL_CF.java:80)

Solution:

INSERT INTO TASTE_PREFERENCES SET USER_ID='22', ITEM_ID='378', PREFERENCE='5.0' ON DUPLICATE KEY UPDATE PREFERENCE='5.0'

Was giving the above exception.  Replaced it with:
INSERT INTO TASTE_PREFERENCES (user_id, item_id, preference) VALUES ('22', '378', '5.0') ON DUPLICATE KEY UPDATE PREFERENCE='4.0'

Then ... ON DUPLICATE KEY UPDATE PREFERENCE='4.0' part was not a standard SQL syntax (perhaps specific to MySQL).

A better way would be to use an ANSI/ISO standard command MERGE ( instead of other db specific variants ) e.g.:
MERGE INTO TASTE_PREFERENCES(user_id, item_id, preference) key(user_id,item_id) VALUES ('22', '378', '4.0')

wrote H2JDBCDataModel.java

Performance Tweaking

I finaly got H2 to run (most of the changes where migrating incompatible SQL from MySQL to standard SQL)

The performance of it was rather slow:
INFO: It took ms: 2,119,766

Used memory-only mode http://www.h2database.com/html/features.html#memory_only_databases
INFO: It took ms: 140,568
Thats a 10 fold improvement

Let JVM use more memory; let db use more memory
Did not help

Profiling

Installed TPTP for eclipse but does not work; could not figure out why

Using NetBeans to do profiling; is throwing some of the old error for some reason.
Fix: The problem with error was that class in jar was not correctly overwritten by NetBeans

For some reason when running from NetBeans userNeighbourhood.size = 0; but when running from command line the same files it is not
Fix: The problem with error was that class in jar was not correctly overwritten by NetBeans

 

Performance

Surprisingly the peformance of MySQL MyISAM and MEMORY engine are almost identical.

Performance of H2 is 10x slower than MySQL.  The difference of code between two implementations is that for

H2: MERGE INTO

MySQL: INSERT ... ON DUPLICATE KEY UPDATE

MySQL

MyISAM Engine

INFO: It took ms: 20,574

Mar 24, 2008 11:05:09 AM org.hrstc.taste.al.AL_CF <init>
INFO: log running..
943
Mar 24, 2008 11:05:11 AM org.hrstc.taste.al.AL_CF getTestUsers
INFO: Selected userID: 421
Mar 24, 2008 11:05:11 AM org.hrstc.taste.al.AL_CF evaluate
INFO: AL Type: random
Mar 24, 2008 11:05:13 AM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}]
Mar 24, 2008 11:05:13 AM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 1577
Mar 24, 2008 11:05:34 AM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}, {MAE=0.8614681}]
Mar 24, 2008 11:05:34 AM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 20574

CREATE TABLE  `taste`.`taste_preferences` (
  `user_id` varchar(10) NOT NULL,
  `item_id` varchar(10) NOT NULL,
  `preference` float NOT NULL,
  PRIMARY KEY  (`user_id`,`item_id`),
  KEY `user_id` (`user_id`),
  KEY `item_id` (`item_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

Memory Engine

INFO: It took ms: 19,781

Mar 24, 2008 2:55:58 PM org.hrstc.taste.al.AL_CF <init>
INFO: log running..
943
Mar 24, 2008 2:56:00 PM org.hrstc.taste.al.AL_CF getTestUsers
INFO: Selected userID: 421
Mar 24, 2008 2:56:00 PM org.hrstc.taste.al.AL_CF evaluate
INFO: AL Type: random
Mar 24, 2008 2:56:02 PM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}]
Mar 24, 2008 2:56:02 PM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 1522
Mar 24, 2008 2:56:22 PM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}, {MAE=0.760892}]
Mar 24, 2008 2:56:22 PM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 19781
Mar 24, 2008 2:56:38 PM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}, {MAE=0.760892}, {MAE=0.86437464}]
Mar 24, 2008 2:56:38 PM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 16243
 

CREATE TABLE  `taste`.`taste_preferences` (
  `user_id` varchar(10) NOT NULL,
  `item_id` varchar(10) NOT NULL,
  `preference` float NOT NULL,
  PRIMARY KEY  (`user_id`,`item_id`),
  KEY `user_id` (`user_id`),
  KEY `item_id` (`item_id`)
) ENGINE=MEMORY DEFAULT CHARSET=latin1

H2

INFO: It took ms: 214,899

Mar 24, 2008 3:01:21 PM org.hrstc.taste.al.AL_CF <init>
INFO: loaded data
943
Mar 24, 2008 3:01:23 PM org.hrstc.taste.al.AL_CF getTestUsers
INFO: Selected userID: 421
Mar 24, 2008 3:01:23 PM org.hrstc.taste.al.AL_CF evaluate
INFO: AL Type: random
Mar 24, 2008 3:01:24 PM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}]
Mar 24, 2008 3:01:24 PM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 1865
Mar 24, 2008 3:04:59 PM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}, {MAE=0.7894972}]
Mar 24, 2008 3:04:59 PM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 214899

HSSQLDB

Starting DB: java -cp ./lib/hsqldb.jar org.hsqldb.Server -database.0 file:mydb -dbname.0 xdb

java -cp ./lib/hsqldb.jar org.hsqldb.util.DatabaseManager

Loading Data

keywords: hsqldb load data file csv

Use Text Table ; also see src/org/hsqldb/sample/load_binding_lu.sql

Error

Error: table not found in statement [SET TABLE SOURCE] / Error Code: -22 / State: S0002

Possible Reason: Text Tables cannot be created in memory-only databases (databases that have no script file).

Tried example from src/org/hsqldb/sample/load_binding_lu.sql

CREATE TEXT TABLE binding_tmptxt (
    id integer,
    name varchar(12)
);

Error: Database is memory only in statement [CREATE TEXT TABLE binding_tmptxt] / Error Code: -63 / State: S1000

Solution: this statement is not allowed for in memory database; start server db instance

Error

When trying sudo a server [org.hsqldb.util.DatabaseManager] get the following error:

java.sql.SQLException: socket creation error

Solution: none
 

 

Various

x

Performance of H2 is 10x slower than MySQL.  The difference of code between two implementations is that for

H2: MERGE INTO

MySQL: INSERT ... ON DUPLICATE KEY UPDATE

Changed it but did not make a difference.

 

Disabled/Enabled Connection pooling but did not produce significant affect either.

Taste Recommender Code Improvement

Since performance still is not good rewrote code for the methods that take too long; and wrote new classes optimized for my task as seen in the next blog posting ....

 

ToDo

 

Tracing

http://www.h2database.com/html/features.html#trace_options

TRACE_LEVEL_SYSTEM_OUT=3

Do also java code generation; this way can benchmark it against MySQL and see where the problem is.

 

Turn on logging finest

See performance for my AL method and optimize for it - since it is the slowest one anyway. 

 

x

If performance still is not good may need to rewrite code for the methods that take too long; or write new classes optimized for my task

For example: may compute user neighborhood only for the specific items; since I need to get estimates for only a few items, etc.

Possible Alternatives

HSQLDB memory mode

MySQL's MEMORY (HEAP) Storage Engine http://dev.mysql.com/doc/refman/5.0/en/memory-storage-engine.html

 

Reply

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options