Sunday, September 27, 2009

Deploying Django to replace a PHP site

From October 2007 to May 2008 I developed a webapp in PHP. Over the past year and a half since it's been up, I've added a few features to it, and it's slowly grown in popularity. A few months ago I came to the conclusion that the site need a total rewrite in order for the code to move forward.

After doing some research, I decided on Django, since just about everything I've read about the framework has been positive.

Fast forward to now. The Django version of the site is about 90% done. It uses way less code, and does everything the PHP site did and then some. Plus, it really only took me 2 months to do the Django version, whereas it took my about 6 or 7 to do it in PHP. Some of that has to do with Django's tools, but mostly the reduced time is attributed to me being a better programmer, as well as the overall design of the site already being determined.

Deploying

Now I must somehow figure out the best way to replace all the PHP code with the Django code. Tjis is not an easy task. The Django version of the site has a completely different authentication method than the PHP version. It's not as easy as jut migrating all data from the PHP database, to the Django database. (my PHP site uses MySQL and the Django version of my site uses PostGIS)

My Plan

This is what I have decided to do: Right now I have the old PHP version of the site located at domain.com, and the Django version located at beta.domain.com. I want to rename the PHP domain to old.domain.com, and then rename the Django url to plain ol' domain.com. Also I am going to have it so if someone tries to visit a link such as "domain.com/page.php", it will detect it's a link to the old site (due to the presence of ".php" in the URL), then redirect that request to "old.domain.com/page.php".

Mod_rewrite, right?

There are a few ways to do it. The first way that pops into the mind of a web developer is to create a new mod_rewrite rule. Mod_rewrite is a well-known apache module that handles rewriting URL's. I have never really gotten used to using mod_rewrite, mostly because theres no real way (that I know of) to test rules without breaking your site.

My Solution

After thinking about the problem for a few days, I came up with a pretty simple and convenient way to do the redirection all within Django. This is how:

First, point the following domains to your django app: domain.com, and beta.domain.com. Point old.domain.com to the PHP app. If you're using webfaction for your host (and you should if you're wanting to deploy a django app on a shared hosting budget), then this is really easy. Just go to the control panel and point your subdomains to the proper application profiles. If you're not using webfaction, then you'll have to mess around with your apache's httpd.conf settings.

Secondly, add a new app within your django project called "redirect". Inside that new app, create a new view called "redirect". That view should look like this:


from django.http import HttpResponseRedirect

def redirect(request):
"""redirect this request to the old domain
(the PHP version)
"""

return HttpResponseRedirect("http://old.domain.com" + request.get_full_path())


Basically what this does is redirect all requests to a new domain. We want to send all requests trying to get a ".php" file through this view. We achieve this by adding the following to urls.py:


(r'\.php', "redirect.views.redirect", ),


For best results, put this urlpattern at the very bottom of the list. This urlpattern matches any url that contains ".php" anywhere in it.

And thats it. I personally haven't deployed this method completely yet, as my Django site still has a few weeks of testing to go through before it's sent out to the public. Once I get it out there, I'll edit this post with any problems or workarounds I encountered.

Tuesday, September 22, 2009

Cron jobs with django made easy

The Problem:

I run a Django website. Its a pilot logbook application, where pilots can use the site to keep track of their flying experience. Each flight you enter to your logbook is represented by the Flight model. The Flight model contains all the information about the flight, such as comments about the flight, length of the flight, when the flight took place, and the route of the flight. The route of the flight is represented by the Route model, which is attached to the Flight model by a ForeignKey relation. The way the user logs a route is by entering a string in to the route field of the FlightForm such as "KLGA - KBOS - KLGA", which represents a flight from La Guardia to Boston, then back to La Guardia. When the FlightForm is saved, the string gets translated to a Route instance via a class method I created. Adding the route to the flight looks a little like this:


r=Route.from_string("KLGA-KBOS-KLGA")
flight.route = r
flight.save()


The problem is that the Route object is fairly complex. It contains a few fields that represent pre-rendered HTML, as well as RouteBase instances (in this case three, two KLGA's, and a KBOS). The RouteBase object then is connected to an Airport instance, which stores the airport's name, city, and coordinates. Since the Route object needs to have foreign key relations for it to make sense (routes with no airports doesn't make sense), the route object has to be saved to the database before the RouteBase's can be added.

What ends up happening here, is that whenever a flight is edited, whether or not the route is edited, a new route instance is created, and the old one is just discarded. Or if the user wants to delete a flight, the route object remains as well. Over time, orphaned routes start to build up, and may cause queries to be slower that they otherwise could be.

The Solution:

The solution here is very quite easy. Create a function like this:


def delete_empty_routes():
from route.models import Route
Route.objects.filter(flight__isnull=True).delete()


...and run it once a day.

But how should we do this? There are a few ways. Some are easy, and some are hard. The easiest way (well, the way I do it) is to make this function a view:


def delete_empty_routes(request):
from route.models import Route
from django.http import HttpResponse
qs = Route.objects.filter(flight__isnull=True)
c = qs.count
qs.delete()
return HttpRequest("deleted %s Routes" % c,
mimetype="text/plain")


and whenever you want to clear out all empty routes, attach this view to a URL, then just hit the URL. For automation, you can easiely add this to your crontab:


6 30 * * * wget http://domain.com/delete-empty-routes


...which will hit that URL every day at 6:30 AM and clear out all empty routes. Another Problem: My site also has a feature where the user can elect to have an email sent to them every few days or weeks with a zipped CSV file attached that contains all their flights they have logged with the site. In the aviation world, your logbook data is very important. If you lose all that data you are screwed. A lot of my users are apprehensive about storing their data with me, fearing that the site will go down one day, taking their data with it. Once I implemented the email feature, I saw my signups skyrocket.

Another Solution:

Again, the solution to this problem is to create a view function that emails each user a backup of their data:


def email_all_users(request, interval):
#interval = an int representing weekly/monthly/biweekly
profiles = Profile.objects.filter(backup_freq=interval)
for p in profiles:
email_backup(p.user) #create backup then send to user
return HttpResponse("success!")


...then set it to a URL, and then to a crontab:


30 3 1 * * wget http://domain.com/send_email-1
30 3 7 * * wget http://domain.com/send_email-2
30 3 14 * * wget http://domain.com/send_email-3
30 3 21 * * wget http://domain.com/send_email-4


One last Problem:

Now you successfully have a URL on your site that, when hit, will send an email to each user who has elected to receive backups. The problem now is that you have a URL on your site, that when hit, sends an email to each user. What is a search engine gets ahold of this URL? What if a user gets ahold of it? Your users will get spammed.

One last Solution:

To protect this function from being hit by unauthorized persons, we must edit the function a little bit:


def email_all_users(request, interval):
from settings import SECRET_KEY # make sure the secret key is present, or else fail
assert request.POST.get('key', "") == SECRET_KEY
#interval = an int representing weekly/monthly/biweekly
profiles = Profile.objects.filter(backup_freq=interval)
for p in profiles:
email_backup(p.user) #create backup then send to user
return HttpResponse("success!")


Now in your crontab, add the following:


30 3 1 * * wget http://domain.com/send_email-1?key=4f46g...
30 3 7 * * wget http://domain.com/send_email-2?key=4f46g...
30 3 14 * * wget http://domain.com/send_email-3?key=4f46g...
30 3 21 * * wget http://domain.com/send_email-4?key=4f46g...


If you want to be even more thorough, you can have the crontab call a python script which adds your settings file to the python path (if it isn't already), then import the SECRET_KEY that way, but doing it this way where you copy/paste the SECRET_KEY works too. We could also have it return a 404 error instead of just an assertion, but either way it still works.