Skip navigation.


Lecture [video]: Mismatched Models, Wrong Results, and Dreadful Decisions

Great lecture; a must watch:

Mismatched Models, Wrong Results, and Dreadful Decisions

author: David J. Hand, Department of Mathematics, Imperial College London


Data mining techniques use score functions to quantify how well a model fits a given data set. Parameters are estimated by optimising the fit, as measured by the chosen score function, and model choice is guided by the size of the scores for the different models. Since different score functions summarise the fit in different ways, it is important to choose a function which matches the objectives of the data mining exercise. For predictive classification problems, a wide variety of score functions exist, including measures such as precision and recall, the F measure, misclassification rate, the area under the ROC curve (the AUC), and others. The first four of these require a classification threshold to be chosen, a choice which may not be easy, or may even be impossible, especially when the classification rule is to be applied in the future. In contrast, the AUC does not require the specification of a classification threshold, but summarises performance over the range of possible threshold choices. However, unfortunately, and despite the widespread use of the AUC, it has a previously unrecognised fundamental incoherence lying at the core of its definition. This means that using the AUC can lead to poor model choice and unecessary misclassifications. The AUC is set in context, its deficiency explained and the implications illustrated - with the bottom line being that the AUC should not be used. A family of coherent alternative scores is described. The ideas are illustrated with examples from bank loans, fraud, face recognition, and health screening.


iPhone iPod delete all music videos applications etc.

iPhone iPod delete all music videos applications etc.

To accomplish the above goal.  Go to iTunes; and then to the tab e.g. "music" from which you want to delete everything and un-check "Sync .." this should get rid of it.



Bibtex Texshop

I always run into this trouble when I compile latex files.  Here is the solution / reminder
P.S. dont forget to specify \bibliographystyle

1. Put the citations in your .tex file in the form
\cite{<key>}. Papers you have not cited will not appear
in the bibliography.

2. Put \bibliography{<bibfilename>} in your .tex file
where you want the bibliography to appear. Make sure the .bib file is
somewhere that latex can find it, such as the same folder as the .tex

3. Run latex then bibtex then latex then latex again, all on your .tex
file (actually you run bibtex on the .aux file, but texshop does this
for you).

Check the results.

For more details, see appendix B of the latex book, or chapter 13 of the latex companion.


Data Visualization



The Grammar of Graphics, Leland Wilkinson
Visualizing Data, William S. Cleveland
The Visual Display of Quantitative Information, Edward Tufte
Information Visualization: Perception for Design, Colin Ware
Show Me the Numbers: Designing Tables and Graphs to Enlighten, Stephen Few



Tableau (pros: one of the best data exploration tools, free for open data; cons: somewhat costly ~$1,700)
Pentaho Reporting
Pivot (Microsoft)

Interactive, brushing, etc.

1d density plot

parallel coordinates plot
GGPlot2 (in R)

theonion clickthrus broken down by date and day of the week

Tree Network Tools


Programming Tools        A popular graphics language        Visualization tools for JavaScript    Visualization tools for Flash        Visualization tools for Java    Mapping tools for Flash/JavaScript

People / Blogs

Andrew (bloger)
Nathan (bloger)
Jeffrey Heer (visualization librariries)
Katy Borner (visualization of science)


Some of the recommendations are by Jefferey Heer (an expert in the area) given at the MediaX 2009 workshop


smoothScatter produces a smoothed color density representation of the scatterplot, obtained through a kernel density estimate.


Japan Mobile SNS presentation

You cannot install numpy on this volume. numpy requires System Python to install os x


You cannot install numpy on this volume. numpy requires System Python  to install os x


Numpy has several files depending on your version of python e.g. 2.5, 2.6.  Make sure you download the right one.


gwt google app engine gae structure client server package

For more information see the following guide kindly provided by Google. Here is the partial copy:


Standard Directory and Package Layout

GWT projects are overlaid onto Java packages such that most of the configuration can be inferred from the classpath and the module definitions.


If you are not using the Command-line tools to generate your project files and directories, here are some guidelines to keep in mind when organizing your code and creating Java packages.

  1. Under the main project directory create the following directories:
    • src folder - contains production Java source
    • war folder - your web app; contains static resources as well as compiled output
    • test folder - (optional) JUnit test code would go here
  2. Within the src package, create a project root package and a client package.
  3. If you have server-side code, also create a server package to differentiate between the client-side code (which is translated into JavaScript) from the server-side code (which is not).
  4. Within the project root package, place one or more module definitions.
  5. In the war directory, place any static resources (such as the host page, style sheets, or images).
  6. Within the client and server packages, you are free to organize your code into any subpackages you require.

Example: GWT standard package layout

For example, all the files for the "DynaTable" sample are organized in a main project directory also called "DynaTable".

  • Java source files are in the directory: DynaTable/src/com/google/gwt/sample/dynatable
  • The module is defined in the XML file: DynaTable/src/com/google/gwt/sample/dynatable/DynaTable.gwt.xml
  • The project root package is:
  • The logical module name is:

The src directory

The src directory contains an application's Java source files, the module definition, and external resource files.

Package File Purpose  The project root package contains module XML files. DynaTable.gwt.xml Your application module. Inherits and adds an entry point class,  Static resources that are loaded programmatically by GWT code. Files in the public directory are copied into the same directory as the GWT compiler output. logo.gif An image file available to the application code. You might load this file programmatically using this URL: GWT.getModuleBaseURL() + "logo.gif".  Client-side source files and subpackages. Client-side Java source for the entry-point class. An RPC service interface.  Server-side code and subpackages. Server-side Java source that implements the logic of the service.

The war directory

The war directory is the deployment image of your web application. It is in the standard expanded war format recognized by a variety of Java web servers, including Tomcat, Jetty, and other J2EE servlet containers. It contains a variety of resources:

  • Static content you provide, such as the host HTML page
  • GWT compiled output
  • Java class files and jar files for server-side code
  • A web.xml file that configures your web app and any servlets

A detailed description of the war format is beyond the scope of this document, but here are the basic pieces you will want to know about:

Directory File Purpose
DynaTable/war/ DynaTable.html A host HTML page that loads the DynaTable app.
DynaTable/war/ DynaTable.css A static style sheet that styles the DynaTable app.
DynaTable/www/dynatable/  The DynaTable module directory where the GWT compiler writes output and files on the public path are copied. NOTE: by default this directory would be the long, fully-qualified module name However, in our GWT module XML file we used the rename-to="dynatable" attribute to shorten it to a nice name.
DynaTable/www/dynatable/ dynatable.nocache.js The "selection script" for DynaTable. This is the script that must be loaded from the host HTMLto load the GWT module into the page.
DynaTable/war/WEB-INF  All non-public resources live here, see the servlet specification for more detail.
DynaTable/war/WEB-INF web.xml Configures your web app and any servlets.
DynaTable/war/WEB-INF/classes  Java compiled class files live here to implement server side functionality. If you're using an IDE set the output directory to this folder.
DynaTable/war/WEB-INF/lib  Any library dependencies your server code needs goes here.
DynaTable/war/WEB-INF/lib gwt-servlet.jar If you have any servlets using GWT RPC, you will need to place a copy of gwt-servlet.jar here.

The test directory

The test directory contains the source files for any JUnit tests.

Package File Purpose  Client-side test files and subpackages. Test cases for the entry-point class.  Server-side test files and subpackages. Test cases for server classes.




ProblemCross-site RPC seemed to work with JSON but not with XMLSolutionStrip out white characters (including new lines)KeywordsXML GWT AJAX XML JSON Google App Engined python cross site web service 

Google App Engine: Running Python and Java side by side


Want to run both java and python by using the same application
(Note: this really only makes sense if you want to use common services such as datastore, memcache, queue, etc.; if not just deploy them as separate applications (doubles your quota) and communicate between them by using web services).


You can simply deploy them to different versions.  Note versions don't have to be numeric.  You can deploy your java code to version "java" and the corresponding url will be ; and deploy your python to by using version "py"

You can let java and python versions communicate between each other by using JSON (more precisely JSONP [for cross site requests])

Using GWT also makes this job somewhat easier



gae app same both java python simultaneously java and python both java and python app id appid together google app engine gwt




WARNING: Failed startup of context error in opening zip file


Powered by jetty://
Jul 3, 2009 10:58:46 AM warn
WARNING: Failed startup of context{/,/Volumes/TRASCEND/docs/neil/Research/GroupFormation/code/GAE/t1/war} error in opening zip file
	at Method)
	at java.util.jar.JarFile.<init>(
	at java.util.jar.JarFile.<init>(
	at org.mortbay.jetty.webapp.TagLibConfiguration.configureWebApp(
	at org.mortbay.jetty.webapp.WebAppContext.startContext(
	at org.mortbay.jetty.handler.ContextHandler.doStart(
	at org.mortbay.jetty.webapp.WebAppContext.doStart(
	at org.mortbay.component.AbstractLifeCycle.start(
	at org.mortbay.jetty.handler.HandlerWrapper.doStart(
	at org.mortbay.component.AbstractLifeCycle.start(
	at org.mortbay.jetty.handler.HandlerWrapper.doStart(
	at org.mortbay.jetty.Server.doStart(
	at org.mortbay.component.AbstractLifeCycle.start(
The server is running at http://localhost:8080/
2009-07-03 19:58:46.975 java[3373:80f] [Java CocoaComponent compatibility mode]: Enabled
2009-07-03 19:58:46.976 java[3373:80f] [Java CocoaComponent compatibility mode]: Setting timeout for SWT to 0.100000
SCFinderPlugin(114): Unable to get bundle identifier.SCFinderPlugin(114): Unable to get bundle identifier.SCFinderPlugin(114): Unable to get bundle identifier.


It seems to be caused by "._" (dot underscore) files created by OSX when a non osx partition is used
I have created the project on the mac partition and to my surprise it fixed the problem
(so much for paying premium for an increased productivity on mac)


Syndicate content