Blogs

[MOVED] In Memory Database

Abstract

Note: The db usage in my case may not be typical.

HSQLDB

did not work well for me: 

administration tools are not user friendly

could not quite figure out how to load data from a file

H2

Nice adminstration tools

Deployment is very simple and well explained in documentation

Speed: in my case slower than MySQL

* Preferered choice in my case

MySQL

Very nice db, but deployment is rather heavy

 

Details

I was considering HSQLDB as my primary choice of in memory database.  After reading its history on http://en.wikipedia.org/wiki/HSQLDB :

HSQLDB is a relational database management system written in Java. It is based on Thomas Mueller's discontinued Hypersonic SQL Project.[1] He later developed H2 as a complete rewrite.

I decided to give H2 a try.  Bellow is the benchmark from the H2's website, which looks quite nice.  Hopefully my application will achieve similar results.

My application.  I am currently running my algorithm on the top of Taste recommender engine using MySQL.  A complete set of experiments for my algorithm and existing  to produce results takes 3 days on a Core2 Duo machine.  I need to try more than 20 different settings = 60 days.  So I decided to try to speed it up.  In addition I don't have permission to install db on the supercomputing cluster at my university so in memory db hopefully will solve both of my problems.

Reasons for In-Memory Database

Performance

Since memory's I/O is much faster than disk I/O this should translate into significant speed up (assuming your db can fit in memory)

Deployment

Could be run in embeded mode -- all you need is java and you can run it locally inside of your application (no additional sotware needs to be installed on the server etc.).

 

Performance

Note: I am using this setup for explorative learning algorithm; it is not a typical usage of the recommender system.

MySQL Taste

Single pass: 138 - 3,819 ms (depending on configuration) 

Taste Movie Lens Notes

DB

# Create Table

 CREATE TABLE taste_preferences (
   user_id VARCHAR(10) NOT NULL,
   item_id VARCHAR(10) NOT NULL,
   preference FLOAT NOT NULL,
   time_stamp VARCHAR(10),
   PRIMARY KEY (user_id, item_id)
 )

# Create Indexes

CREATE INDEX IDX_USER_ID ON TASTE_PREFERENCES  ( USER_ID )

CREATE INDEX IDX_ITEM_ID ON TASTE_PREFERENCES  ( ITEM_ID )

CREATE INDEX IDX_PREFERENCE ON TASTE_PREFERENCES  ( PREFERENCE  )

# Load data from file

INSERT INTO TASTE_PREFERENCES  SELECT * FROM CSVREAD('/home/neil/tmp/u.txt');

see AL_CF.java for more details

Java

Some parts of code are not well documented; the javadoc seems to be out of sync.  Had to browse for this one for a while, untill found it. 

        // Create DB Connection   
        // H2
        String driverName = "org.h2.Driver";
        String url = "jdbc:h2:~/test";
        String user = "sa";
        String pwd = "";

        org.h2.jdbcx.JdbcDataSource db = new org.h2.jdbcx.JdbcDataSource();
        db.setUser(user);
        db.setPassword(pwd);
        db.setURL(url);

Exception

java.util.NoSuchElementException: Can't retrieve more due to exception: org.h2.jdbc.JdbcSQLException: The result set is not scrollable and can not be reset. You may need to use conn.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY). [90128-66]

Changed com.planetj.taste.impl.model.jdbc.AbstractJDBCDataModel
* NEIL -- Modified ResultSetUserIterator to fix the following Exception: 

* TODO: Do it in a better way 

Exception

org.h2.jdbc.JdbcSQLException: Syntax error in SQL statement INSERT INTO TASTE_PREFERENCES SET[*] USER_ID=?, ITEM_ID=?, PREFERENCE=? ON DUPLICATE KEY UPDATE PREFERENCE=? ; expected ., (, DEFAULT, VALUES, (, SELECT, FROM; SQL statement:
INSERT INTO taste_preferences SET user_id=?, item_id=?, preference=? ON DUPLICATE KEY UPDATE preference=? [42001-66]
    at org.h2.message.Message.getSQLException(Message.java:89)
    at org.h2.message.Message.getSQLException(Message.java:93)
    at org.h2.message.Message.getSyntaxError(Message.java:103)
    at org.h2.command.Parser.getSyntaxError(Parser.java:454)
    at org.h2.command.Parser.parseSelectSimple(Parser.java:1445)
    at org.h2.command.Parser.parseSelectSub(Parser.java:1366)
    at org.h2.command.Parser.parseSelectUnion(Parser.java:1249)
    at org.h2.command.Parser.parseSelect(Parser.java:1237)
    at org.h2.command.Parser.parseInsert(Parser.java:826)
    at org.h2.command.Parser.parsePrepared(Parser.java:343)
    at org.h2.command.Parser.parse(Parser.java:265)
    at org.h2.command.Parser.parse(Parser.java:241)
    at org.h2.command.Parser.prepareCommand(Parser.java:209)
    at org.h2.engine.Session.prepareLocal(Session.java:213)
    at org.h2.engine.Session.prepareCommand(Session.java:195)
    at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:970)
    at org.h2.jdbc.JdbcPreparedStatement.<init>(JdbcPreparedStatement.java:1206)
    at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:161)
    at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.prepareStatement(PoolingDataSource.java:302)
    at com.planetj.taste.impl.model.jdbc.AbstractJDBCDataModel.setPreference(AbstractJDBCDataModel.java:420)
    at com.planetj.taste.impl.recommender.AbstractRecommender.setPreference(AbstractRecommender.java:81)
    at org.hrstc.taste.al.AL_CF.evaluate(AL_CF.java:243)
    at org.hrstc.taste.al.AL_CF.main(AL_CF.java:80)

Solution:

INSERT INTO TASTE_PREFERENCES SET USER_ID='22', ITEM_ID='378', PREFERENCE='5.0' ON DUPLICATE KEY UPDATE PREFERENCE='5.0'

Was giving the above exception.  Replaced it with:
INSERT INTO TASTE_PREFERENCES (user_id, item_id, preference) VALUES ('22', '378', '5.0') ON DUPLICATE KEY UPDATE PREFERENCE='4.0'

Then ... ON DUPLICATE KEY UPDATE PREFERENCE='4.0' part was not a standard SQL syntax (perhaps specific to MySQL).

A better way would be to use an ANSI/ISO standard command MERGE ( instead of other db specific variants ) e.g.:
MERGE INTO TASTE_PREFERENCES(user_id, item_id, preference) key(user_id,item_id) VALUES ('22', '378', '4.0')

wrote H2JDBCDataModel.java

Performance Tweaking

I finaly got H2 to run (most of the changes where migrating incompatible SQL from MySQL to standard SQL)

The performance of it was rather slow:
INFO: It took ms: 2,119,766

Used memory-only mode http://www.h2database.com/html/features.html#memory_only_databases
INFO: It took ms: 140,568
Thats a 10 fold improvement

Let JVM use more memory; let db use more memory
Did not help

Profiling

Installed TPTP for eclipse but does not work; could not figure out why

Using NetBeans to do profiling; is throwing some of the old error for some reason.
Fix: The problem with error was that class in jar was not correctly overwritten by NetBeans

For some reason when running from NetBeans userNeighbourhood.size = 0; but when running from command line the same files it is not
Fix: The problem with error was that class in jar was not correctly overwritten by NetBeans

 

Performance

Surprisingly the peformance of MySQL MyISAM and MEMORY engine are almost identical.

Performance of H2 is 10x slower than MySQL.  The difference of code between two implementations is that for

H2: MERGE INTO

MySQL: INSERT ... ON DUPLICATE KEY UPDATE

MySQL

MyISAM Engine

INFO: It took ms: 20,574

Mar 24, 2008 11:05:09 AM org.hrstc.taste.al.AL_CF <init>
INFO: log running..
943
Mar 24, 2008 11:05:11 AM org.hrstc.taste.al.AL_CF getTestUsers
INFO: Selected userID: 421
Mar 24, 2008 11:05:11 AM org.hrstc.taste.al.AL_CF evaluate
INFO: AL Type: random
Mar 24, 2008 11:05:13 AM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}]
Mar 24, 2008 11:05:13 AM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 1577
Mar 24, 2008 11:05:34 AM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}, {MAE=0.8614681}]
Mar 24, 2008 11:05:34 AM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 20574

CREATE TABLE  `taste`.`taste_preferences` (
  `user_id` varchar(10) NOT NULL,
  `item_id` varchar(10) NOT NULL,
  `preference` float NOT NULL,
  PRIMARY KEY  (`user_id`,`item_id`),
  KEY `user_id` (`user_id`),
  KEY `item_id` (`item_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

Memory Engine

INFO: It took ms: 19,781

Mar 24, 2008 2:55:58 PM org.hrstc.taste.al.AL_CF <init>
INFO: log running..
943
Mar 24, 2008 2:56:00 PM org.hrstc.taste.al.AL_CF getTestUsers
INFO: Selected userID: 421
Mar 24, 2008 2:56:00 PM org.hrstc.taste.al.AL_CF evaluate
INFO: AL Type: random
Mar 24, 2008 2:56:02 PM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}]
Mar 24, 2008 2:56:02 PM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 1522
Mar 24, 2008 2:56:22 PM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}, {MAE=0.760892}]
Mar 24, 2008 2:56:22 PM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 19781
Mar 24, 2008 2:56:38 PM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}, {MAE=0.760892}, {MAE=0.86437464}]
Mar 24, 2008 2:56:38 PM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 16243
 

CREATE TABLE  `taste`.`taste_preferences` (
  `user_id` varchar(10) NOT NULL,
  `item_id` varchar(10) NOT NULL,
  `preference` float NOT NULL,
  PRIMARY KEY  (`user_id`,`item_id`),
  KEY `user_id` (`user_id`),
  KEY `item_id` (`item_id`)
) ENGINE=MEMORY DEFAULT CHARSET=latin1

H2

INFO: It took ms: 214,899

Mar 24, 2008 3:01:21 PM org.hrstc.taste.al.AL_CF <init>
INFO: loaded data
943
Mar 24, 2008 3:01:23 PM org.hrstc.taste.al.AL_CF getTestUsers
INFO: Selected userID: 421
Mar 24, 2008 3:01:23 PM org.hrstc.taste.al.AL_CF evaluate
INFO: AL Type: random
Mar 24, 2008 3:01:24 PM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}]
Mar 24, 2008 3:01:24 PM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 1865
Mar 24, 2008 3:04:59 PM org.hrstc.taste.al.AL_CF evaluate
INFO: User's Stats:
[{MAE=5.0}, {MAE=0.7894972}]
Mar 24, 2008 3:04:59 PM org.hrstc.taste.al.AL_CF evaluate
INFO: It took ms: 214899

HSSQLDB

Starting DB: java -cp ./lib/hsqldb.jar org.hsqldb.Server -database.0 file:mydb -dbname.0 xdb

java -cp ./lib/hsqldb.jar org.hsqldb.util.DatabaseManager

Loading Data

keywords: hsqldb load data file csv

Use Text Table ; also see src/org/hsqldb/sample/load_binding_lu.sql

Error

Error: table not found in statement [SET TABLE SOURCE] / Error Code: -22 / State: S0002

Possible Reason: Text Tables cannot be created in memory-only databases (databases that have no script file).

Tried example from src/org/hsqldb/sample/load_binding_lu.sql

CREATE TEXT TABLE binding_tmptxt (
    id integer,
    name varchar(12)
);

Error: Database is memory only in statement [CREATE TEXT TABLE binding_tmptxt] / Error Code: -63 / State: S1000

Solution: this statement is not allowed for in memory database; start server db instance

Error

When trying sudo a server [org.hsqldb.util.DatabaseManager] get the following error:

java.sql.SQLException: socket creation error

Solution: none
 

 

Various

x

Performance of H2 is 10x slower than MySQL.  The difference of code between two implementations is that for

H2: MERGE INTO

MySQL: INSERT ... ON DUPLICATE KEY UPDATE

Changed it but did not make a difference.

 

Disabled/Enabled Connection pooling but did not produce significant affect either.

Taste Recommender Code Improvement

Since performance still is not good rewrote code for the methods that take too long; and wrote new classes optimized for my task as seen in the next blog posting ....

 

ToDo

 

Tracing

http://www.h2database.com/html/features.html#trace_options

TRACE_LEVEL_SYSTEM_OUT=3

Do also java code generation; this way can benchmark it against MySQL and see where the problem is.

 

Turn on logging finest

See performance for my AL method and optimize for it - since it is the slowest one anyway. 

 

x

If performance still is not good may need to rewrite code for the methods that take too long; or write new classes optimized for my task

For example: may compute user neighborhood only for the specific items; since I need to get estimates for only a few items, etc.

Possible Alternatives

HSQLDB memory mode

MySQL's MEMORY (HEAP) Storage Engine http://dev.mysql.com/doc/refman/5.0/en/memory-storage-engine.html

 

MySQL and Taste

MySQL

After installing mysql with synaptic package manger

Goto the menu Applications/Programming/MySQL Administrator
hostname: localhost
username: root
password: your password

If you are not using mysql server all the time you may want to remove it from the services and start it manually by System/Administration/Services

Taste

Taste version 1.7.1 was used

To create database named taste -- click on Catalogs and then under Schemata right click and select create schema "taste"

Then use MySQL Query Browswer and follow the instructions to create table at http://taste.sourceforge.net/javadoc/com/planetj/taste/impl/model/jdbc/M...

 

Loading MovieLens Data

LOAD DATA INFILE 'ratings.dat' INTO TABLE taste_preferences FIELDS TERMINATED BY '::';

# for 1 million

LOAD DATA INFILE '/home/neil/tmp/u.data' INTO TABLE taste_preferences FIELDS TERMINATED BY '\t';

# for 100K

 

 

 

 

 

 

Notes:

Some say that recommendation technology represents the new paradigm of search: interesting items find the user instead of the user explicitly searching for them. In an article published in CNN Money, entitled “The race to create a 'smart' Google”, Fortune magazine writer Jeffrey M. O'Brien, writes:

The Web, they say, is leaving the era of search and entering one of discovery. What's the difference? Search is what you do when you're looking for something. Discovery is when something wonderful that you didn't know existed, or didn't know how to ask for, finds you.

 

Various Issues

Issue 1

WARNING: You are not using ConnectionPoolDataSource. Make sure your DataSource pools connections to the database itself, or database performance will be severely reduced.
 

FIX: ConnectionPoolDataSource dataSource = new MysqlConnectionPoolDataSource();

 

 

Applet Issues

I am trying to get an applet working that makes a URL Connection.  I have run into quite a few problems.  Here is some ways of solving them.

A nice guide from Sun: http://java.sun.com/docs/books/tutorial/security/toolsign/index.html


Signing Applet Jar

http://yellowcat1.free.fr/keytool_iui.html

A very nice GUI tool; saved me a lot of time.


http://www.codeguru.com/forum/showthread.php?t=40310

Shopping for laptop in Japan

This is a quick guide for buying a mid level laptop (based on recommendations given to one of my friends recently). 
For ultra portable perhaps asus eee pc is a clear leader (for now) http://kakaku.com/item/00200916376/

A very nice site for comparing the prices is http://kakaku.com/sku/pricemenu/winn.htm
Note: sometimes you can get quite a bit better deal by clicking on one of the advertisements on the page.

Installing R GUI (JGR) on Ubuntu

I had the followoing error while trying to install JGR (included bellow).This problem is described in http://ubuntuforums.org/showthread.php?t=424921Although the suggestions listed there did not work for me.Here what did work:

Blog 2

This is my blog entry and so on....

Syndicate content