Skip navigation.


google.appengine.api.datastore_errors.BadArgumentError: _app must not be empty.

google.appengine.api.datastore_errors.BadArgumentError: _app must not be empty.
occured when calling db.put

This is probably caused by running your program from outside of the app engine.

Run it from the interactive console: http://localhost:8080/_ah/admin/interactive



google.appengine.api.datastore_errors.BadValueError: Property identifier is not multi-line

google.appengine.api.datastore_errors.BadValueError: Property identifier is not multi-line


class StringProperty(verbose_name=None, multiline=False, ...)

    A short string property. Takes a Python str or unicode (basestring) value of 500 bytes or less.

    StringProperty property values are indexed, and can be used in filters and sort orders.

    If multiline is False, then the value cannot include linefeed characters. The djangoforms library uses this to enforce a difference between text fields and textarea fields in the data model, and others can use it for a similar purpose.

    Value type: str or unicode

Python SAX

xml.sax._exceptions.SAXParseException: not well-formed (invalid token)

UnicodeDecodeError: 'utf8' codec can't decode byte 0x9a in position: unexpected code byte

1) My file was not encoded in utf8 (could use eclipse to check/change encoding)
2) added to data file the following line
<?xml version="1.0" encoding="UTF-8"?>

        #parser.parse(, "r", "utf-8"))

Error: xml.sax._exceptions.SAXParseException:  junk after document element

added <ListRecords> and </ListRecords> to the beginning and the end of the document respectively


Google App Engine remote_api

The example provided in docs for Google App Engine remote_api did not quit work for me right away.  Here are some modifications that will make it work.

First of all (thanks to the following post) use instead the following code (for OSX) for

import code
import getpass
import os
import sys

DIR_PATH = "/Applications/"

  os.path.join(DIR_PATH, 'lib', 'yaml', 'lib'),

sys.path = EXTRA_PATHS + sys.path

from google.appengine.ext.remote_api import remote_api_stub
from google.appengine.ext import db

def auth_func():
  return raw_input('Username:'), getpass.getpass('Password:')

if len(sys.argv) < 2:
  print "Usage: %s app_id [host]" % (sys.argv[0],)
app_id = sys.argv[1]
if len(sys.argv) > 2:
  host = sys.argv[2]
  host = '' % app_id

remote_api_stub.ConfigureRemoteDatastore(app_id, '/remote_api', auth_func, host)

code.interact('App Engine interactive console for %s' % (app_id,), None, locals())

Don't forget to comment out (or change from *) the in app.yaml

#- url: /.*
#  script:

- url: /remote_api
  script: $PYTHON_LIB/google/appengine/ext/remote_api/
  login: admin


Otherwise you will get the following error:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Applications/google_appengine/google/appengine/ext/db/", line 1336, in __iter__
  File "/Applications/google_appengine/google/appengine/ext/db/", line 1742, in run
    query_run = self._proto_query.Run(*self._args, **self._kwds)
  File "/Applications/google_appengine/google/appengine/ext/gql/", line 657, in Run
    res = bind_results.Get(self.__limit, offset)
  File "/Applications/google_appengine/google/appengine/api/", line 942, in Get
    return self._Run(limit, offset)._Next(limit)
  File "/Applications/google_appengine/google/appengine/api/", line 1536, in _Next
    apiproxy_stub_map.MakeSyncCall('datastore_v3', 'Next', req, result)
  File "/Applications/google_appengine/google/appengine/api/", line 68, in MakeSyncCall
    apiproxy.MakeSyncCall(service, call, request, response)
  File "/Applications/google_appengine/google/appengine/api/", line 240, in MakeSyncCall
    stub.MakeSyncCall(service, call, request, response)
  File "/Applications/google_appengine/google/appengine/ext/remote_api/", line 181, in MakeSyncCall
    handler(request, response)
  File "/Applications/google_appengine/google/appengine/ext/remote_api/", line 219, in _Dynamic_Next
    'remote_datastore', 'RunQuery', request, query_result)
  File "/Applications/google_appengine/google/appengine/ext/remote_api/", line 147, in MakeSyncCall
  File "/Applications/google_appengine/google/appengine/tools/", line 344, in Send
    f =
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/", line 389, in open
    response = meth(req, response)
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/", line 502, in http_response
    'http', request, response, code, msg, hdrs)
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/", line 427, in error
    return self._call_chain(*args)
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/", line 361, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/", line 510, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 404: Not Found

The guestbook example gave the following error:

>>> import helloworld
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Users/neil/Documents/workspace/helloworld/src/", line 5, in <module>
    from google.appengine.ext import webapp
  File "/Applications/google_appengine/google/appengine/ext/webapp/", line 68, in <module>
    import webob
ImportError: No module named webob

Instead of using somewhat convoluted example provided by the docs

Just copy and paste the following sections separated by bold comments (make sure to paste them separately):

from google.appengine.ext import db

class Greeting(db.Model):
  author = db.UserProperty()
  content = db.StringProperty(multiline=True)
  date = db.DateTimeProperty(auto_now_add=True)

# Now make sure to hit enter couple of times to get out of ident mode

q = db.GqlQuery("SELECT * FROM Greeting ORDER BY date DESC LIMIT 10")
results = q.fetch(5)

# Enter the username and password

for r in results:
    print r.content





CiteSeerX DataSet


Steps for downloading the full dataset from CiteSeerX:

Gregor has kindly provided CiteSeerX fetcher that works much better than the old approach.  Thanks :)


P.S.  I am in the process of setting up a new web server at the lab; so I will try make a whole dump available one of these days.  If you need it sooner email me.



Great thanks to this blog post

I have added slightly more details, and made some minor changes (that seemed to work better in my case)

  1. Download and extract the "Demo" from
  2. download xerces and place xerces.jar it in the same directory as Demo
  3. Go to the Demo directory, type the following command (all in one line) to download the full dataset of CiteSeerX to the file "citeseerx_alldata.xml"
    java -classpath .:oaiharvester.jar:xerces.jar org.acme.oai.OAIReaderRawDump -o citeseerx_alldata.xml

    Note most likely you will not see anything right away; so you may check the file citeseerx_alldata.xml to make sure that things are being added to it.  The size of the data is about 520 MB, so it may take a while.

For more information you may also want to look at the original website for the oai harverster here

keywords: citeseer citeseerx data set dataset dump download database db citation analysis link analysis

Link Analysis Software

Some notes on link analysis software: [looks very nice]

looks very nice; python

quite nice python
seems rather nice, scalable, java
java, seems very interesting
looks interesting too

A plugin for Microsoft Excel that provides very nice tools (is especially useful for people that are not familar with programming)


Up and comming open source graph visualization and analysis platform written in Java (w/ support for plugins).






Comparison between igraph and networkx [Conrad Lee]

Some difference I could point to:

   - iGraph has some community detection algorithms implemented, while
   NetworkX does not.
   - iGraph's GraphML exporter included a more complete implementation of
   the GraphML specification, meaning that if you have a graph with all sorts
   of things labeled and weighted, it might be easier to export all this data
   into GraphML with iGraph.

Another difference between the two packages (and the reason I prefer NetworkX), is that NetworkX is well-documented, and has a fairly active community that can answer questions.  I found that the python version of
iGraph was not very well documented.  Also, NetworkX is written more in python than iGraph.  If you are using the python version of iGraph, then you will usually not be able to read the source code in python--you will just run into a binding.



PDF annotations comments linux ubuntu


Mendeley has a built in pdf editor/annotator (linux, mac, windows).  To persist your comments to pdf make sure to select File/"Export PDF with Annotations".  
Mendeley also works great for its original purpose -- organizing research papers.


In Adobe Acrobat click on: Comments / Enable for Commenting and Analysis in Adobe Reader

Now you can annotate documents with Adobe Reader in Linux/Ubuntu :)

keywords: PDF annotations comments linux ubuntu highlight


> "We need to enable commenting for Acrobat Reader 8.  Is there a way to batch process these files instead of using Acrobat 8 to open each file and enable them manually?"

"Using Adobe LiveCycle Reader Extensions Server - yes, there is."

"Finally sorted this one… well if you use a mac, try using the 'Watch me do' function in automator!"


Tricky LyX Error

Error Message:

Missing $ inserted
I've inserted a begin-math/end-math symbol since I think
you left one out. Proceed, with fingers crossed.

The error was caused by having \oslash in my math macro TeX, LyX for the \null


lyx cross reference math macro
lyx cross reference math macro begin-math end-math error left one out

Lab Budget 2008

This year I have my first research budget to spend .  I am a strong beleiver in open disclosure (public accountability) of the use of any of the public (tax) funds, so I am including justifications for my decisions.


I often need to run extensive simmulations therefore I need access to some computing power. 

I personally would prefer to utilize computing cloud (e.g. Amazon's elastic cloud).  Quite unfortunately due to the budget spending restrictions, I am not allowed to do that.  So this leaves me no option but to buy some servers.

Intel just introduced a new Nehalem architecture which significanlty outperforms the previous architecture (Penryn)  [wiki]:

  • 1.1x to 1.25x the single-threaded performance or 1.2x to 2x the multithreaded performance at the same power level
  • 30% lower power usage for the same performance
  • According to a preview from AnandTech "expect a 20-30% overall advantage over Penryn with only a 10% increase in power usage. It looks like Intel is on track to delivering just that in Q4."[10]
  • Core-wise, clock-for-clock, Nehalem will provide a 15%-20% increase in performance compared to Penryn. [1]

It is due mainly to [wiki]:

  • Integrated memory controller supporting two or three memory channels of DDR3 SDRAM or four FB-DIMM channels
  • A new point-to-point processor interconnect QuickPath, replacing the legacy front side bus
  • Simultaneous multithreading by multiple cores and hyperthreading (2x per core).

Unfortunately the Nehalem based server CPU (Xeon) is not available yet.  The desktop core i7 cpu is available, and outputerforms the current Harpertown, at lower cost (especially considering the cheaper cost of the DDR3 memory in comparison with FB-DDR2).

Using desktop architecture instead of server architecture may not provide the desired stability.  But for non-cruicial applications it is not likely to be an issue. 

From the main vendors Dell seems to be the only one currently offering the core i7; so I decided to go with Dell Studio XPS Desktop.   I am ordering a base system and then separately ordering additional memory, hard drives etc; since Dell overcharges for upgrade by quite a large margin (e.g. to get 12G of memory costs $450, while I can purchase it online for $216 (36x6), thats a very unreasonal markup of over 100%).


Credit-based Cloud Computing

Update: Existing Sollutions

Looks like there are already some existing sollutions  such as :

appears to be rather close to my needs: credit is given for computing resources, which could be claimed as needed at the latter time.

Currently that is the only solution from the following wiki list:

Software implementations and middleware


Peer-to-peer grid



  1. Using cloud resources is costly.
  2. My servers are idling most of the time.


Allow to use my servers for a future credit to be used on the servers of other participants.  The company that runs it could charge a percentage for the service.  I know that I would personally pay up to 50% of the cpu credits.  Seems like a rather useful and profitable idea.  Too bad this is not the area of my expertise.  Hopefully somebody will implement it sometime soon.

<<< TODO >>>

[Peer Computing]

existing peer networks: share file, network bandwidth; why not also cpu?

peer credit-based cloud computing


• in many organizations computers are often idling

• when cpu is needed

– costly (may offload to cloud e.g. Amazon); sometimes not feasible: e.g. using the budget to pay for the computational resources

Solution: share your computational resources when they are not needed; and receive computational credit


Syndicate content