April 30, 2009

what did you say?

Wordle - Gordon_s tweets-1.jpg

I've been poking around at the Twitter API, in part just out of curiosity about what features are exposed. I have an interest in writing some visualisation widgets based upon it. The iPhone development course is also using a Twitter client as something of a 'hello world' app, too. Today, Tim O'Reilly pointed to a wordle visualisation of all the things that he's tweeted and gave a link to some code that could be used to download everything you'd tweeted. I had a look at it and decided to write something similar, using the Twitter API directly, rather than scraping the Twitter site.

The Twitter API I've been using is the excellent, minimalist python twitter tools by Mike Verdone. The main advantage over other python Twitter APIs is that ptt doesn't redefine any of the API calls. It does exactly what it says in the published Twitter API. As a result, it is incredibly easy to use. The 100 or so lines it is implemented in are also a very instructive read, to see how it is put together. I think it is a great example of how the attributes in Python can be used.

The code I wrote is available for download. It respects the rate limiting imposed by Twitter and will output all of the tweets for a particular user, to a file called <username>.tweet in the file it is run from. You can change which users are fetched in the main() ftn. The resulting text file can be opened up and then copy/ pasted over into the wordle creator.


April 26, 2009

dd-wrt

dd-wrt

Just installed dd-wrt on my Linksys wrt54g wireless router. I'd been meaning to do this for a while - as an easy way to get a much more functional router than the default firmware shipped by Linksys. What finally motivated me to do it was the recent storm about Time Warner Cable introducing bandwidth caps in Austin. Although TWC seem to have backed down for the moment, they have also recently started disconnecting customers for 'using too much bandwidth' on their infinite bandwidth contracts. The DD-WRT firmware gives me an independent way to monitor my usage and get an idea of how much I typically transmit & receive.  

The DD-WRT firmware install wasn't quite as smooth as the documentation might make you believe. The first time I installed and then tried to update the firmware, I got a fairly unhelpful 'Error 2: Access violation' error from the tftp prompt and not much else. I went back through the management mode initial vxkiller upload and things seemed to work better the second time around. For a while I was worried that I had a brick of router.

Once back up and running, the settings were very similar to the previous Linksys options, so it was quite quick to get the wireless settings and port forwarding, DMZ etc that I was using previously reconfigured. Now I have historical and realtime graphs of bandwidth usage available. Should be interesting to be able to monitor what's going on. If they are cutting people off for using 44GB per week and saying that is “that is more than most people use in a year” I am a little concerned at my 7.2GB in one day. That was a few iPhone development videos from Stanford and then we watched Quantum of Solace last night on the Xbox. Seems like Time Warner consider that aberrant behaviour.

April 08, 2009

iPhone development

iPhone Simulator

Interested in learning about iPhone development? Want to study at Stanford? Don't want to pay the tuition fees? On iTunesU (the lecture streaming part of iTunes) you can follow along with class cs193p from Stanford, on iPhone Application Development. In addition to the good quality video of the lectures, all of the class slides, handouts and assignments are available, for free. If you have an Intel Mac, you can also download the development tools, iPhone SDK and a simulator, again all free. If you do want to actually develop and test applications on an iPhone or iPod Touch, you'll need to pay the $99 developer fee to get the encryption keys that let you run applications on a phone and allows you to submit apps for the app store. At least for the basics, the simulator is useful as a target platform for testing, although there are differences between it and the final platform. (features such as multi-touch and the accelerometer are hard to test for example, unless you want to start shaking your computer).

Lifehacker recently had an article on all of the educational resources that are becoming available on the web, for free. iTunesU is a good example of the sort of teaching resources that are out there, if you look. The quality is variable, but there are some excellent resources if you are prepared to dig.

creating a codeswarm movie

code swarm frame

Download video (3Mb)

A codeswarm is a visualization of the activity within a source code repository. The image and linked video above shows the lifetime of one of Verilab's source repositories. You can see code being created, the check-ins as they happen and an indication of which users are doing the work at any given time. It is an example of an 'organic information visualization' and is created using the Processing toolkit. The original visualization tools were developed by Michael Ogawa and the source code is available on Google code.

In this particular case I created the animation on OS X 10.5, using a combination of codeswarm, ffmpeg and LAME. If you are interested in doing something like this yourself:

First you'll need to make sure you have a recent version of the Java Development Kit installed (JDK 1.5 or later). You'll also need a recent version of Ant installed. (I have version 1.7.0, which ships with OS X as default). Download the code_swarm source and install it. Then execute 'ant run'. If all is well, you should get a dialog box prompting you for the source repository, user name and password.

At this point, I put in the svn+ssh URL for the Verilab repository that I wanted to visualize. Everything fell over, with a Java error (NoClassDefFoundError within com/trilead/ssh2). From this I realised I needed to install the SSH libraries for Java, from Trilead. I downloaded those, unpacked them and added the jar file to my CLASSPATH. Along the way I found out the default OS X CLASSPATH definition is in /System/Library/Java/JavaConfig.plist which may be useful as a starting point.

With that fixed, I again ran 'ant run' and put in the relevant information. A bit of time passes as the checkin information is extracted from the repository, then the visualisation runs. You'll find that repository information that was extracted is saved, under the ./data directory (look for the latest realtime_sample.*.xml file) . This is useful for the next stages, as you don't have to fetch the information again. If you want to create a video of the visualisation, there are a few more hoops to jump through.

You will need to configure codeswarm to save the frames for each stage of the visualisation. You do this by editing the ./data/sample.config file. First off, copy it to a new version for your particular project. Then edit these values:

  • InputFile= [Point it at the new realtime_sample<number>.xml file in the data directory, that contains the checkin information for your project]
  • TakeSnapshots=true

That's all you really need to change. You can also change the other values, to alter the visualisation. The ColorAssignX= statements use regexp values to differentiate different types of checkin and colour code them accordingly. Play around with the other values, with TakeSnapshots set to false and re-run the visualisation until you get something you are satisfied with. Then run one more time with TakeSnapshots=true to save off the frame images. You can run with the new configuration by running 'ant run data/your_project.config'

After running with TakeSnapshots enabled, you'll have a set of images in the ./frames directory, (controlled by the SnapshotLocation option in the config file). The final step is to assemble those into a movie. The easiest way I found to do this is to use the command-line utility, ffmpeg. There are a variety of ways to install ffmpeg, but the simplest way seems to be to install ffmpegX and then extract the binary from the application bundle. You can also get it using Fink or MacPorts. If you want to use an audio track with your visualisation, you will also probably require LAME. With ffmpeg working, it is simple to point it towards the image files from codeswarm and produce the final movie. The finishing touch was adding some music from an mp3 file, then limiting the duration via the -t switch, to end when the video frames ran out, rather than playing all of the music.

ffmpeg -i frames/code_swarm-%05d.png -i 6_sym.mp3 -qmax 15 -t 100 -f image2 -r 24 <output_filename>.mpg

You can run 'ffmpeg' without any switches to get help on the options. If all goes well, you should end up with an MPEG format video in the file <output_filename>.mpg.

February 23, 2009

the death of books

ghost rider

The reports of my death are greatly exaggerated

- Mark Twain

It seems that some universities are moving away from physical books, switching entirely to electronic textbooks. My initial reaction is that this is just a little bit crazy. Electronic reference materials have a place, but I have a real difficulty with only electronic textbooks as being the best approach. Certainly there is a financial justification and a reduction in the physical weight the students have to carry. There is no doubt an advantage for the book stores, having to carry less physical inventory and ship it around the country.

But none of this takes into account how you interact with a physical book. It just isn't the same having the material online or on a PDF in a laptop. A screen is harder to read (a good laptop screen is still less than 100dpi, books and print are 300dpi or more) and as it is a lower resolution than printed material, you can only see a small amount of the information at a time. Diagrams and accompanying text are often hard to see all in one place. This is part of the reason why reading on a screen can be so tiring. Also the backlit text is harder on the eyes than reading from a page. The second drawback is how you physically interact with a book - flicking quickly through pages, marking pages with a highlighter, inserting post-it notes, curling up in a chair to read a book, spreading several books and notes out across a table. All of these metaphors may eventually be replaced with digital analogues that are as powerful or more so, but it seems we are quite far from that time.

The Amazon Kindle is probably about as good as this gets just now and from what I can tell, it still falls far below a good hunk of printed tree. The Kindle does have a higher resolution screen, which helps with reading for a long time, but the screen is small and the navigation feels clunky. Laptops are worse.

I do find a lot of value in online reference books. I've had a subscription to O'Reilly's Safari for over a year now and have found it to be invaluable, particularly when traveling. I can have access to a variety of reference texts, easily searchable, almost always available (if you have an internet connection). However, I've never been able to read any of the books I have on my Safari subscription, for more than a few pages. It just doesn't seem to work for me. No doubt I'm destined to become a relic in my views on reading, but it seems that we approach reading on a screen differently to a book. I'd love to have some sort of larger Kindle device, linked to a Safari subscription. Some way to really read those books on Safari, rather than just treating them as reference works. It always feels that this is just right around the corner, yet we never quite get there.

February 16, 2009

edward tufte and presenting data

me

I was lucky enough to attend a seminar from Edward Tufte, a couple of weeks ago, on the Presentation of Data and Information. Edward Tufte is probably best known for the book 'The Quantitative Display of Visual Information' and was an engaging and entertaining presenter. He has a very different style from the normal Powerpoint-driven presentation approach. In fact, much of his work is railing against the uses and abuses of Powerpoint and similar slide techniques.

The main take-away I got from the whole day was that if you have to communicate complicated data sets or information, that you really need to consider how people will use and interact with the data first. Too often, we go straight to presentation software and start trying to work out how to express the information in slides, rather than taking the time to consider if there are other, better ways to impart the information. Tufte was very keen on the concept of a 'super-graphic' which is a data rich, high resolution physical handout that lets participants see and consider a lot of data at once. A map is a great example of a super-graphic, or the weather page in a typical newspaper. A key part of this is that paper is much higher resolution than a typical computer screen (72dpi to 600dpi means you can show a whole lot more data in the same space). This is why multiple display screens are really useful for serious work. It also means that printing out and sharing data is a great way to get information infront of people in a meeting, rather than drip feeding it from slides)

I compare this idea to another guide I saw in the same week on creating powerpoint presentations that admonishes that there should never be more than 8 numbers on any slide or graphic. Tufte's response to this was repeatedly 'when did we become so stupid, just because we walked into a business meeting?' People handle large, complex data displays every day in the real world. People read and study sports scores in a newspaper, or financial reports without any trouble at all.

Let the data drive the presentation format, rather than the presentation software drive how the data is displayed.

stop & search

a cold day in London

One of the least enjoyable experiences on a recent trip to London, last week, happened while I was taking pictures of the London eye. I was standing a few hundred meters away, shooting with a normal point and shoot camera, just like all the people around me, when a couple of police officers approached me. I'd heard about photographers being hassled in London but was surprised this managed to happen to me within 48 hours of arriving in the city. They started out by saying that 'they didn't really believe I was a terrorist, but were stopping photographers to make people aware that they were watching what was going on'. From there, they handed me a form that listed my rights under section s44 of the anti-terrorism law then proceeded to question me about what I was doing, where I was from, why I was taking pictures.

As far as I can tell, even though they themselves said they have no reasonable clause, the Terrorism act says that's fine. We spent about 5 minutes going through where I've lived and having me justify why I take pictures. Then they wanted to see all the pictures I'd been taking (again, as far as I can tell, in contradiction of their own guidelines on collection of evidence). On looking through the images, one of the officers stated that 'those look just like the sorts of pictures a terrorist would take' and then told me to move on. The picture above is what I was taking, when the stopped me. I got a 'stop and search' form listing that the stop was authorised under the anti-terrorism laws and that was part of a 'pre-planned op'. I can only assume from that they the London police have decided to institutionalise harassing photographers for the sake of   security theatre. Particularly, if when they find images that they think would be typical terrorist images, they wave the photographer on.

This is all in a city that seems to have more CCTV cameras everywhere than there are people. I'm not quite sure who if anyone is actually watching these camera feeds. The whole thing is quite worrying, for someone who has been out of the UK for a few years. We used to make jokes about books like 1984 or movies like V for Vendetta but it seems that piece by piece typical rights to privacy are being whittled away by a government that is using good intentions to grab as much additional powers as possible. Sure, it is just hassling a photographer in the street, taking pictures of a tourist attraction for no reason, but each time has an increasing chilling effect on what people feel they can do and what government authorities can get away with doing. I didn't argue with the particular officers, mainly as I didn't want to spend half my day discussing it in a police station on my holiday. Maybe that's part of the problem too.

'There's an implicit admission that Section 44 stops and searches do not detect terrorists. This is borne out by the available data. In the financial years 2003/4 to 2006/7, the Met stopped and searched 31,797 pedestrians using the powers of Section 44(2); of these only 79 were arrested in connection with terrorism - less than a quarter of a percent - and even fewer will be convicted. The purpose of deterring is feeble considering the extent to which the Home Office is ready to go to avoid revealing when and where the exceptional powers for Section 44 apply.'

At the end of this five minute waste of time, they started asking me about the number of megapixels my camera had, commented on how impressed they were by the quality of the pictures on the screen and asked where they could buy one and if I'd recommend it.

December 02, 2008

complete

Finally! Now my Mac can do everything my windows machine can do. I'm so happy.

December 01, 2008

image recovery


I take a lot of pictures. On occasion, I get too impatient when downloading images from my compact flash cards. I'll swap the card without ejecting it properly and sometimes the cards get corrupted. Typically, when that happens the file allocation table of the previous card that was in the reader gets written onto the new card, or the FAT gets corrupted in some other way. The images are still there, but you can't access them. This happened to me last weekend and I didn't have any recovery software on this laptop. I had a look around online and the only recovery programs I could find were close to $100. I had a bit of free time so I decided to try writing my own instead. Turns out a basic recovery tool is actually really simple to put together.

A couple of things made it possible to do quite simple image recovery, successfully. Firstly, I always format the cards in the camera before I use them. So I know when the camera is writing images to the cards, the card is empty. Secondly, I never delete images in the camera. This means there is no fragmentation on the drive. The images are simply stored sequentially on the memory. The FAT format is fairly simple, based on sectors that are multiples of 512 bytes in size, that are collected together in clusters that vary depending on the disk formating. Images are written into linked lists of those clusters. Potentially the clusters could be fragmented across the drive, particularly if images are deleted and new ones stored on the disk. With a clean start and no images deleted, it is reasonable to assume that the images will just be stored on concurrent clusters. I think damaged sectors are managed at the a physical level on the disks, so they are mapped out of the available space (feel free to correct me on this). Anyway, with these assumptions made, it is possible to write a simple tool to parse a disk image and extract images, with a high likelihood of a successful result.

The first step is to get the data. I did the recovery on a unix system and used dd to get the initial image. You have to dump the actual physical device, not one of the disk partitions (as those are essentially what has become corrupt)

dd if=/dev/rdisk1 of=image.img bs=512

The block size is set to 512 to match the formating of the compact flash card. This step takes a while, but eventually you'll have an image file, image.img which is a low level copy of the data on the drive. The next step is to work out a way to identify the files you are looking to recover. I wrote a simple hex dump tool that prints the first few bytes of a file. I used this on a representative sample of the Canon cr2 RAW files to get a search key to identify the start of a file.

--- show_header.py ---

import sys

file = open(sys.argv[1], 'rb')

header = file.read(12)

headerhex = header.encode('hex')

print headerhex

--- end show_header.py ---

This little bit of python can be applied to a group of files with xargs

ls *.cr2 | xargs -n 1 python show_header.py

From that output, it is easy enough to find a representive number of bytes that can be used to identify the start of a file. I also had recorded some audio with the camera, so did a similar process with .wav files to extract them correctly.

Then all you have to do is iterate through the disk image in block_size chunks, checking for those file signatures at the start of each sector. When you find a file signature, start dumping all the data to a new file, until you find another signature. That's all there is to it. Note that there are no warranties with this. I'm offering no guarantees that it will work, or even will not wipe your computer. Use at your own risk. With this I was able to recover the 150+ images that I'd taken and several audio files. It actually works surprisingly quickly once the disk image has been made. Also worth mentioning that the JPEG header matching is untested, as I didn't have any JPEG files on this particular disk, but is included here for completeness.

Download the source for image_recovery.py (you'll probably need to change the file extension - web server doesn't like serving .py files)

November 10, 2008

a command line for the gui

open

I've been experimenting with Mac OS X now for a few months. Trying to work out if it is a reasonable platform for Verilab to use internally for our various computing needs. One thing that I've come to love and now struggle to live without is a small add-on called Quicksilver. Superficially it isn't very interesting. A fast application launcher - where's the fun in that? But after you delve a bit deeper, it becomes something else. It is really a command line interface for OS X, on top of the GUI.

I hit a key and up it pops. Type a few characters and I can launch an application. I don't have to search through drop down menus or find icons on a desktop - much faster. I want to get a music player to jump to the next track - hot key and type 'next'. Stop the music, type 'stop'. The magic is that I don't have to switch away from what I'm currently doing, go to a different window or application and do anything. Want to email someone? Hot key, type a bit of their name and select their email address - again, without switching away from what you are working in. I can even open and edit text files without starting up a text editor - extend a todo list, add a calendar entry to my google calendar, all from the command line within the GUI. All without switching away from the current context. I can run unix terminal commands, search in documents, select groups of files and email them to someone, do quick calculations. It is amazingly more efficient than using a mouse and hunting for applications and buttons.

I've always been a bit of a command-line junkie, wanting to know the keyboard shortcuts for things in a GUI environment, liking having a command prompt, but Quicksilver is different. It's a visual blend of command-line and GUI. Taking the best bits of both and putting them right under my fingertips. Highly recommended. Shame there isn't anything nearly as good for Windows XP or linux that I've found. Look here for more ideas about what Quicksilver can do.


My Photo

subscribe

  • Email subscribtion

    Enter your email address:

    Delivered by FeedBurner