HRSTC [HeuRiSTiC]

Blogs

Problem:
occured when calling db.put

Reason:
This is probably caused by running your program from outside of the app engine.

Solution:
Run it from the interactive console: http://localhost:8080/_ah/admin/interactive

Error:

Reason:

class StringProperty(verbose_name=None, multiline=False, ...)

A short string property. Takes a Python str or unicode (basestring) value of 500 bytes or less.

StringProperty property values are indexed, and can be used in filters and sort orders.

If multiline is False, then the value cannot include linefeed characters. The djangoforms library uses this to enforce a difference between text fields and textarea fields in the data model, and others can use it for a similar purpose.

Value type: str or unicode

Python SAX

xml.sax._exceptions.SAXParseException: not well-formed (invalid token)

UnicodeDecodeError: 'utf8' codec can't decode byte 0x9a in position: unexpected code byte

1) My file was not encoded in utf8 (could use eclipse to check/change encoding)
2) added to data file the following line
<?xml version="1.0" encoding="UTF-8"?>

parser.parse(open(fileName))
#parser.parse(codecs.open(fileName, "r", "utf-8"))

3)
Error: xml.sax._exceptions.SAXParseException:  junk after document element

Solution:
added <ListRecords> and </ListRecords> to the beginning and the end of the document respectively
[http://mail.python.org/pipermail/python-list/2002-November/172310.html]

Refs:
http://evanjones.ca/python-utf8.html
http://bytes.com/groups/python/818634-problem-parsing-utf-8-encoded-xml-minidom

The example provided in docs for Google App Engine remote_api did not quit work for me right away.  Here are some modifications that will make it work.

First of all (thanks to the following post) use instead the following code (for OSX) for appengine_console.py:

#!/usr/bin/python
#http://allen.hutchison.org/2009/03/appengine-remoteapi-example-on-os-x.html
import code
import getpass
import os
import sys

EXTRA_PATHS = [
DIR_PATH,
os.path.join(DIR_PATH, 'lib', 'yaml', 'lib'),
]

sys.path = EXTRA_PATHS + sys.path

def auth_func():

if len(sys.argv) < 2:
print "Usage: %s app_id [host]" % (sys.argv[0],)
app_id = sys.argv[1]
if len(sys.argv) > 2:
host = sys.argv[2]
else:
host = '%s.appspot.com' % app_id

remote_api_stub.ConfigureRemoteDatastore(app_id, '/remote_api', auth_func, host)

code.interact('App Engine interactive console for %s' % (app_id,), None, locals())

Don't forget to comment out (or change from *) the helloworld.py in app.yaml

handlers:
#- url: /.*
#  script: helloworld.py

- url: /remote_api
script: $PYTHON_LIB/google/appengine/ext/remote_api/handler.py login: admin Otherwise you will get the following error: Traceback (most recent call last): File "<console>", line 1, in <module> File "/Applications/google_appengine/google/appengine/ext/db/__init__.py", line 1336, in __iter__ return self.run() File "/Applications/google_appengine/google/appengine/ext/db/__init__.py", line 1742, in run query_run = self._proto_query.Run(*self._args, **self._kwds) File "/Applications/google_appengine/google/appengine/ext/gql/__init__.py", line 657, in Run res = bind_results.Get(self.__limit, offset) File "/Applications/google_appengine/google/appengine/api/datastore.py", line 942, in Get return self._Run(limit, offset)._Next(limit) File "/Applications/google_appengine/google/appengine/api/datastore.py", line 1536, in _Next apiproxy_stub_map.MakeSyncCall('datastore_v3', 'Next', req, result) File "/Applications/google_appengine/google/appengine/api/apiproxy_stub_map.py", line 68, in MakeSyncCall apiproxy.MakeSyncCall(service, call, request, response) File "/Applications/google_appengine/google/appengine/api/apiproxy_stub_map.py", line 240, in MakeSyncCall stub.MakeSyncCall(service, call, request, response) File "/Applications/google_appengine/google/appengine/ext/remote_api/remote_api_stub.py", line 181, in MakeSyncCall handler(request, response) File "/Applications/google_appengine/google/appengine/ext/remote_api/remote_api_stub.py", line 219, in _Dynamic_Next 'remote_datastore', 'RunQuery', request, query_result) File "/Applications/google_appengine/google/appengine/ext/remote_api/remote_api_stub.py", line 147, in MakeSyncCall request_pb.Encode())) File "/Applications/google_appengine/google/appengine/tools/appengine_rpc.py", line 344, in Send f = self.opener.open(req) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 389, in open response = meth(req, response) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 502, in http_response 'http', request, response, code, msg, hdrs) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 427, in error return self._call_chain(*args) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 361, in _call_chain result = func(*args) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 510, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 404: Not Found The guestbook example gave the following error: >>> import helloworld Traceback (most recent call last): File "<console>", line 1, in <module> File "/Users/neil/Documents/workspace/helloworld/src/helloworld.py", line 5, in <module> from google.appengine.ext import webapp File "/Applications/google_appengine/google/appengine/ext/webapp/__init__.py", line 68, in <module> import webob ImportError: No module named webob Instead of using somewhat convoluted example provided by the docs Just copy and paste the following sections separated by bold comments (make sure to paste them separately): from google.appengine.ext import db class Greeting(db.Model): author = db.UserProperty() content = db.StringProperty(multiline=True) date = db.DateTimeProperty(auto_now_add=True) # Now make sure to hit enter couple of times to get out of ident mode q = db.GqlQuery("SELECT * FROM Greeting ORDER BY date DESC LIMIT 10") results = q.fetch(5) # Enter the username and password for r in results: print r.content CiteSeerX DataSet Steps for downloading the full dataset from CiteSeerX: Update: Gregor has kindly provided CiteSeerX fetcher that works much better than the old approach. Thanks :) P.S. I am in the process of setting up a new web server at the lab; so I will try make a whole dump available one of these days. If you need it sooner email me. Old  Great thanks to this blog post I have added slightly more details, and made some minor changes (that seemed to work better in my case) 1. Download and extract the "Demo" from http://purl.oclc.org/NET/OPENSRC/downloads/oaiharvester/jars/oaiharvesterdemo.tar 2. download xerces and place xerces.jar it in the same directory as Demo http://www.apache.org/dist/xerces/j/Xerces-J-bin.2.9.1.zip 3. Go to the Demo directory, type the following command (all in one line) to download the full dataset of CiteSeerX to the file "citeseerx_alldata.xml" java -classpath .:oaiharvester.jar:xerces.jar org.acme.oai.OAIReaderRawDump http://citeseerx.ist.psu.edu/oai2 -o citeseerx_alldata.xml Note most likely you will not see anything right away; so you may check the file citeseerx_alldata.xml to make sure that things are being added to it. The size of the data is about 520 MB, so it may take a while. For more information you may also want to look at the original website for the oai harverster here keywords: citeseer citeseerx data set dataset dump download database db citation analysis link analysis Link Analysis Software Some notes on link analysis software: http://en.wikipedia.org/wiki/Social_network_analysis_software [looks very nice] http://www.kdnuggets.com/software/link-analysis-social-networks.html *** http://networkx.lanl.gov looks very nice; python *** http://igraph.sourceforge.net/ quite nice python http://jung.sourceforge.net/applet/showlayouts2.html seems rather nice, scalable, java http://www.sonivis.org java, seems very interesting http://nwb.slis.indiana.edu/index.html looks interesting too NodeXL A plugin for Microsoft Excel that provides very nice tools (is especially useful for people that are not familar with programming) Gephi Up and comming open source graph visualization and analysis platform written in Java (w/ support for plugins). Notes: Comparison between igraph and networkx [Conrad Lee] Some difference I could point to: - iGraph has some community detection algorithms implemented, while NetworkX does not. - iGraph's GraphML exporter included a more complete implementation of the GraphML specification, meaning that if you have a graph with all sorts of things labeled and weighted, it might be easier to export all this data into GraphML with iGraph. Another difference between the two packages (and the reason I prefer NetworkX), is that NetworkX is well-documented, and has a fairly active community that can answer questions. I found that the python version of iGraph was not very well documented. Also, NetworkX is written more in python than iGraph. If you are using the python version of iGraph, then you will usually not be able to read the source code in python--you will just run into a binding. PDF annotations comments linux ubuntu Mendeley has a built in pdf editor/annotator (linux, mac, windows). To persist your comments to pdf make sure to select File/"Export PDF with Annotations". Mendeley also works great for its original purpose -- organizing research papers.  In Adobe Acrobat click on: Comments / Enable for Commenting and Analysis in Adobe Reader Now you can annotate documents with Adobe Reader in Linux/Ubuntu :) keywords: PDF annotations comments linux ubuntu highlight > "We need to enable commenting for Acrobat Reader 8. Is there a way to batch process these files instead of using Acrobat 8 to open each file and enable them manually?" "Using Adobe LiveCycle Reader Extensions Server - yes, there is." "Finally sorted this one… well if you use a mac, try using the 'Watch me do' function in automator!" Tricky LyX Error Error Message: Missing$ inserted
asdfasdf\ref{sec:asdf}
I've inserted a begin-math/end-math symbol since I think
you left one out. Proceed, with fingers crossed.

Solution:
The error was caused by having \oslash in my math macro TeX, LyX for the \null

Keywords:

lyx cross reference math macro
lyx cross reference math macro begin-math end-math error left one out

Lab Budget 2008

This year I have my first research budget to spend .  I am a strong beleiver in open disclosure (public accountability) of the use of any of the public (tax) funds, so I am including justifications for my decisions.

Server

I often need to run extensive simmulations therefore I need access to some computing power.

I personally would prefer to utilize computing cloud (e.g. Amazon's elastic cloud).  Quite unfortunately due to the budget spending restrictions, I am not allowed to do that.  So this leaves me no option but to buy some servers.

Intel just introduced a new Nehalem architecture which significanlty outperforms the previous architecture (Penryn)  [wiki]:

• 1.1x to 1.25x the single-threaded performance or 1.2x to 2x the multithreaded performance at the same power level
• 30% lower power usage for the same performance
• According to a preview from AnandTech "expect a 20-30% overall advantage over Penryn with only a 10% increase in power usage. It looks like Intel is on track to delivering just that in Q4."[10]
• Core-wise, clock-for-clock, Nehalem will provide a 15%-20% increase in performance compared to Penryn. [1]

It is due mainly to [wiki]:

• Integrated memory controller supporting two or three memory channels of DDR3 SDRAM or four FB-DIMM channels
• A new point-to-point processor interconnect QuickPath, replacing the legacy front side bus

Unfortunately the Nehalem based server CPU (Xeon) is not available yet.  The desktop core i7 cpu is available, and outputerforms the current Harpertown, at lower cost (especially considering the cheaper cost of the DDR3 memory in comparison with FB-DDR2).

Using desktop architecture instead of server architecture may not provide the desired stability.  But for non-cruicial applications it is not likely to be an issue.

From the main vendors Dell seems to be the only one currently offering the core i7; so I decided to go with Dell Studio XPS Desktop.   I am ordering a base system and then separately ordering additional memory, hard drives etc; since Dell overcharges for upgrade by quite a large margin (e.g. to get 12G of memory costs $450, while I can purchase it online for$216 (36x6), thats a very unreasonal markup of over 100%).

Credit-based Cloud Computing

Update: Existing Sollutions

Looks like there are already some existing sollutions  such as :

appears to be rather close to my needs: credit is given for computing resources, which could be claimed as needed at the latter time.

Currently that is the only solution from the following wiki list:

Software implementations and middleware

ToDo:

Peer-to-peer grid

Problem

1. Using cloud resources is costly.
2. My servers are idling most of the time.

Solution

Allow to use my servers for a future credit to be used on the servers of other participants.  The company that runs it could charge a percentage for the service.  I know that I would personally pay up to 50% of the cpu credits.  Seems like a rather useful and profitable idea.  Too bad this is not the area of my expertise.  Hopefully somebody will implement it sometime soon.

<<< TODO >>>

[Peer Computing]

existing peer networks: share file, network bandwidth; why not also cpu?

peer credit-based cloud computing

Need:

• in many organizations computers are often idling

• when cpu is needed

– costly (may offload to cloud e.g. Amazon); sometimes not feasible: e.g. using the budget to pay for the computational resources

Solution: share your computational resources when they are not needed; and receive computational credit

-