Home » Odeon Blogs »

Stefan Talpalaru, CTO

using RDF for scientific data sets

The Resource Description Framework (RDF) is a very flexible data model that breaks down the information to its most basic building blocks: statements of the form subject-predicate-object called "triples". These statements and their ability to describe any relation, no matter how complex, make the separate schema we see in relational databases obsolete, allowing the implied schema to naturally follow the changes in the data.

One specific type of resource identifier, the URI, is the key for creating a self-describing data set. If all the predicate URIs are also valid URLs pointing to web pages describing those predicates, then anyone who comes
in contact with the serialized RDF data will be able to make sense of it. The data can become self-documenting facilitating the kind of collaboration that happens in the scientific community, for example. Further more, if some of those URIs point to external data sets we have interconnectivity, tuning into the wet dream of semantic web proponents - the global graph.

The Brain Architecture Management System (BAMS) is one real world example of what RDF can do. The dynamic RDF/XML serialization alone goes a long way towards facilitating data access. Behind the scenes, RDF simplifies the addition of new predicates (compared to the management of columns, indices and foreign keys in RDBMS) and query construction through SPARQL. This implementation is also the basis for a future OWL ontology on top of RDF and maybe an open SPARQL access point. Yes, I'm excited about it because it's an Odeon project and we got to do the actual migration from mysql to a triple store, but the wind of change is blowing throughout the scientific community (at least in neuroscience), and people start moving towards semantic technologies. Granted, you'll see many static files processed with desktop software like Protégé, but the direction is clear.

Category: RDF

Leave a Comment

introducing pyopeninviter

When switching from PHP to Python (long time ago) we felt one important shortcoming for our social components - the lack of a contact importer. There was a great one for PHP - openinviter - but no python equivalent, so we came up with this wrapper: pyopeninviter. By using the command line php interpreter we call the openinviter functions, encode their output as JSON and decode it in python. The concept is quite simple, really, and the code in api.py is straight forward. A bit uglier is what happens in cli.php, partly because of PHP itself, partly because of the error handling. Take a look at the examples in test.py for the final result and have fun using it.

Categories: Python pyopeninviter

2 Comments

logging with UTC timestamps from python


By default, the 'logging' module uses the local time for timestamps. Here's how you can make it use UTC with a custom formatter class:

  1. import logging
  2. import time

  3. class UTCFormatter(logging.Formatter):
  4. converter = time.gmtime # not documented, had to read the module's source code ;-)

  5. logger = logging.getLogger('foobar')
  6. logger.setLevel(logging.DEBUG)
  7. fh = logging.FileHandler('some_log_file')
  8. fh.setLevel(logging.DEBUG)
  9. formatter = UTCFormatter('[%(asctime)s] %(message)s', '%d/%b/%Y:%H:%M:%S')
  10. fh.setFormatter(formatter)
  11. logger.addHandler(fh)

Category: Python

4 Comments

introducing django-cherrypy-odeon


Our favorite Django deployment strategy is to run the WSGI server from cherrypy behind cherokee. Here's the result of this experience - django-cherrypy-odeon. You can now enjoy the stability and low resource consumption of cherrypy as a production server along with a very fast, threaded, development server replacement. No more AJAX problems with concurrent requests while keeping the familiar devserver output. Enjoy the multitude of options and have fun benchmarking processes against threads (spoiler: the GIL sucks).


We stand, of course, on the shoulders of giant snippets. Besides exposing more cherrypy options, we added the 'runserver_cp' command and a patched version of wsgiserver that outputs the requested paths when tickled in a certain way.


The sysadmins among you might want to check out the init scripts provided for the dreaded Debian and the wonderful Gentoo. Pardon the common shell script doing the heavy lifting, but we think that code duplication is a sin.

Category: django-cherrypy-odeon

Leave a Comment

hunting down the cookie jar

In case you were wondering where the Google App Engine SDK authentication cookie is saved after the first login when running "appcfg.py update", it's in ~/.appcfg_cookies

Category: GAE

1 Comment

keeping an idle SSH connection alive

I've noticed that a long running SSH session is getting unresponsive after lacking input for a certain period. The culprit seems to be my ISP that kills idle connections for some reason. The fix is simple - set up keep-alive by editing /etc/ssh/ssh_config and adding this:

  1. Host *
  2. ServerAliveInterval 15
  3. ServerAliveCountMax 4

Now the ssh client will ask the server for a sign of life every 15 seconds thus keeping the connection open. As an added bonus, if the server fails to respond 4 times in a row the client gives up and closes the connection itself.

Category: Linux

Leave a Comment

x11vnc init script

If you need remote access to a X server running on a Linux box (the real display, not a Xvfb session) your best bet is x11vnc. It has all the fancy compression protocols and SSL/TLS encryption along with a bunch of authentication schemes. The only thing that's missing is an init script so it can be run as a persistent service. Here's the one I'm using on Gentoo - /etc/init.d/x11vnc:

  1. #!/sbin/runscript

  2. depend() {
  3. need xdm
  4. after xdm
  5. }

  6. start() {
  7. ebegin "Starting x11vnc"
  8. start-stop-daemon --start --quiet --pidfile /var/run/x11vnc.pid --make-pidfile --background --exec x11vnc -- -auth guess -display :0 -forever -ssl SAVE -http -unixpw_nis -o /var/log/x11vnc.log -loop
  9. eend $?
  10. }

  11. stop() {
  12. ebegin "Stoping x11vnc"
  13. start-stop-daemon --stop --quiet --pidfile /var/run/x11vnc.pid
  14. eend $?
  15. }

You'll want to run the x11vnc command manually the first time (as root) to create the SSL certificate and to debug any problem that might appear. Once it's all good, feel free to "/etc/init.d/x11vnc start; rc-update add x11vnc default". It will be started after xdm, so there's a X server to connect to. Because of the '-loop' option it will restart 2 seconds after being closed so you can log off from your session, thus restarting X and giving back control to the display manager, wait a few seconds then connect again over VNC and login. The automation is especially useful if the user can't be bothered to login over ssh and run a command each time he wants access. In fact, things can be simplified even further by using the java applet as a viewer - just point the user to http://yourhost:5800/ and all he needs is a java enabled browser. The applet uses SSL so it's secure, and the authentication is done with the existing Unix accounts (disregard the 'nis' in '-unixpw_nis', it's just a 'crypt' based method for checking passwords)

Categories: Gentoo Linux

Leave a Comment

fail2ban and SSH public key authentication

Using fail2ban is a great way to prevent dictionary attacks on SSH but I encountered an unusual problem with it: I sometimes got banned after frequent successful logins. The reason was that I had public key authentication set up for another user on the same host and ssh was trying to use it for all the other accounts before prompting me for a password. The default fail2ban filters consider the "Failed publickey" error in the sshd log file at the same level with a failed password login hence the ban.

To change this behavior I had to edit /etc/fail2ban/filter.d/sshd.conf and change

  1. ^%(__prefix_line)sFailed (?:password|publickey) for .* from (?: port \d*)?(?: ssh\d*)?$

to

  1. ^%(__prefix_line)sFailed password for .* from (?: port \d*)?(?: ssh\d*)?$


and, of course, restart the daemon. From a security point of view, I find it highly unlikely that an attacker might use brute force with public keys so the setup is still safe.

Category: Linux

3 Comments

disabling client side caching for the whole site in Django

If you have to disable client side caching in Django because of a defective proxy imposed by an ISP that caches pages it shouldn't, this middleware will come in handy:

  1. from django.utils.cache import add_never_cache_headers

  2. class DisableClientSideCaching(object):
  3. def process_response(self, request, response):
  4. try:
  5. if request.user.is_authenticated():
  6. add_never_cache_headers(response)
  7. except:
  8. pass
  9. return response

It kicks in only for logged in users. The try block is there because the automatic redirection from /foo/bar to /foo/bar/ triggers an exception in request.user.is_authenticated() - some data is not available in the request object.

Category: Django

Leave a Comment

using git to track the Linux kernel

If you follow closely the upstream releases of the Linux kernel, you would benefit by cloning Linus' git tree like that:

  1. git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

You might also want to use the 2.6.x.y releases that apply various bug fixes for the latest stable 2.6.x version. However, they are kept in a different git tree, that shares many objects with the main one we just cloned. Fortunately, git has the capacity of using objects from a reference tree so we can save time and bandwidth by doing:

  1. git clone --reference linux-2.6 git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.33.y.git

From here on it's the usual "git tag" to see the tags that can be checked out, getting back to the master branch before running "git pull", etc. Just remember to pull the main tree before the 2.6.x.y one. Have fun configuring and compiling your custom kernel!

Category: Linux

Leave a Comment
Page generated in: 0.27s