## Seven Languages in Seven Weeks

I read this book, by Bruce Tate, some weeks ago and totally recommend it. You can buy it from amazon or the pragmatic bookshelf.

### What’s in it?

From the title of the book is easy to get an idea. This book explores seven different programming languages, the idea is that you spend one week using each programming language and get an idea of what’s out there to offer alternatives to the standards we are used to. By the standard I mean what I consider the standard, based on my experience most people program in Python, PHP, C++, C# or Java.

The languages covered areRuby, Io, Prolog, Scala, Erlang, Clojure, and Haskell.

The book has one chapter per language (plus intro and wrap-up chapters). Each language is presented in three parts, called days. Usually the first one shows the very basics, like input/output or math operations. The second day is usually used to present something that is different for this language, and the third day to present a harder example where the language shows to be useful and superior to others. For example, in Prolog, you can solve Sudoku puzzles by day 3 with little effort.

### Why do I recommend it?

1. The third day shows you an example where the language actually helps you. This is in contrast to what my programming languages course was,  where we went through 4 different languages. In that course we implemented pretty much the same things in each language. Maybe someone would disagree with my following statement, but in my opinion, implementing QuickSort in Prolog just makes you think you are wasting your time. You know how to implement it in C, and it works faster, so what’s the point? In this book you don’t do that kind of things. As I mentioned before, one of the examples for Prolog is solving a Sudoku puzzle. This shows you something great about the language. You can do this with almost no effort. If you try implementing it in another language, say Java, it is certainly going to take more effort.
2. The presentation is clear. It is easy to follow and you don’t feel lost at some point looking at code you don’t understand. For instance, in the basics, you usually go through basic math operations which are really similar to at least one language you know, yet the author takes the time to go through them. The explanations are of the right length to not be boring either, which is also a great plus. Another point regarding the basics, the author goes into things like typing with examples of that part, which also makes it much more interesting.
3. You are likely to find lots of things you didn’t know. In my case, I haven’t been exposed to that many programming languages, most of them imperative and object oriented. The only language I knew besides that was Scheme and some Common Lisp. I had a great time reading about these other languages that are somehow different and offer powerful constructs that make it easy to do things I know to be hard. For example, the future objects in Io, or the way you can build a service monitor in Erlang.
4. The author is not selling you the languages. Well, maybe some of them more than others. But the important point is that I didn’t find the book to be too biased. In fact, for every language the author presents it and finishes listing advantages and disadvantages. This is a great thing, not only you get information about when to use a language, but also when not to.
5. You get to hear from the people that invented the languages (or important users of it). In every chapter you get to see an interview with someone that can be considered as a worthy representative of the language. I found this really interesting, and adds external opinions to the book, which I think, increases its value.

It is important to make clear that you will not learn the seven languages by reading this book. I would not recommend it to someone trying to learn how to program or trying to learn one of those seven languages.

If you know how to program, and enjoy it, you should consider buying it.

Of course, the previous post does not count, so I have to start with something. Here is a list of interesting talks/articles that I’ve seen lately.

Dan Meyer: Math class needs a makeover

Points out some problems in math education, where doing math becomes just pattern matching. Search for the variables you need and replace them in the formula. The proposal is to make them realize the information needed to solve a problem, as in real life, where it’s rare to get exactly the pieces of information you need.

Conrad Wolfram: Teaching kids real math with computers

The main critique to the math education system is that it focuses in calculation (computation) instead of real math. The proposal is to use computers to do the calculation and allow children to get intuition on it by playing with them, seeing for what the things they learn are useful. An interesting detail I found in this talks is that the presenter’s proposal for teaching the basics of calculation, like multiplicating big numbers, can be accomplished teaching the kids how to program that method instead of making the repeat it $10^6$ times.

Arthur Benjamin’s formula for changing math education

A 3min short opinion-video. The main idea is to include statistics as the main subject. The main argument for this is that most people only need that in their day to day life (plus the basics, I guess).

Those videos are awesome, they really catch you attention and it’s hard to close the video before it finishes.

10 easy ways to fail a Ph.D.

The title explains exactly what the article is about. And in it you can find a link to another article I found really interesting: Productivity tips, tricks and hacks for academics.

## New Web/Blog, Why?

I updated my Web and my blog (it used to be @wordpress). The main reason to do so was to get everything in one place and to move from http://www.recoded.cl to http://fclaude.recoded.cl. The main domain redirects to this page and will continue to do so for a while, since I don’t have time right now to setup another site.

Some of the projects I’m hosting at this domain are starting to attract people to collaborate in them, and since those projects are hosted as subdomains of recoded, I wanted to allow the option of using the root domain pointing to the projects instead of being my personal page. Besides, that makes it okay to also potentially host projects where I’m not involved.

## Coming back to my blog

I haven’t written in here for a long time. There are many reasons for that, lots of things to do,  traveling, and time. Yet, the most important one, I didn’t know what to blog about.

The main problem is not finding what to write about, but to decide what should actually go into the blog. So far I’ve mostly posted about my trips and papers, and that’s something one should do. If you don’t think people want to know about what you write, then maybe you shouldn’t have had written that paper in the first place :).

The missing part is the things I didn’t want to post. For example, talks and useful links I find. Or comments about books I read. After some thought I decided that I should also include that. So my new goal, copying from Alex Bowe, is to post once every two weeks.

## CSCBCE, WAA and Finland!

From May 20-22 we were hosting CSCBCE 2010 in Waterloo. I was part of the organizing committee so it was a very busy week. Everything went quite well, Bob (the chair of the conference) was able to get a lot of funding and that made everything in the conference much easier, and of course, the meals awesome! :D. I couldn’t attend much to all the talks/tutorials, since I couldn’t escape of my TA duties (wasn’t really away :) ), but I certainly enjoyed the ones I could attend and met a lot of interesting people.

After that, on the 23rd I left to Helsinki to present the paper we wrote with Gonzalo Navarro, “Extended Compact Web Graph Representations”. It was published in Algorithms and Applications, a Festschrift for Prof. Esko Ukkonen for his 60th birthday. The workshop was great, with an interesting variety of topics. After it we went for dinner to a lovely place in the center of Helsinki :-).

I stayed here in Helsinki to work with Prof. Veli Makinen and part of the group on succinct data structures (Leena Salmela and Niko Valimaki). We have been looking at representations for the graph that comes from the overlaps in the sequences obtained during the genome sequencing process. So far we have some interesting results in practice and some pretty pictures on how this graph looks.

About the city itself, it is an amazing city, very alive and with all sorts of things to do (maybe because it’s summer now). I went for a walk on Saturday and ended up in a festival where a Chilean group was playing. It was an amazing experience, got to meet many other Chileans that live here in Helsinki and we ended up having a really nice party.

I guess that would be my report for now, on Sunday I’m going back to Waterloo and leaving again on Wednesday to Fredericton for a workshop on data structures for spatial data.

## New Domain

I bought a new domain for my personal Web page and to host some projects (The blog stays at WordPress).

The main reason to buy the domain was to be able to mention URLs to source code in papers and be sure that I could maintain the URL for more than 2-3 years. The new address for my Web pages is http://www.recoded.cl/.

As I mentioned in the last post, I uploaded some of the code for the compact/compressed Web graph representations in http://webgraphs.recoded.cl/, the next project I would like to upload is libcds, I’ve been working on the new version for quite a while now, as soon as I have that one I’ll create a Web page for the project and upload it here. The main reason for moving it is that I prefer to have a private svn server and publish only the final code on the Web, plus the option to build the Web page using the tools I feel more comfortable with.

## New papers

I got two really good news after my return from LATIN. Our paper (with Gonzalo Navarro) about compact Web graphs representations was accepted for publications in ACM Transactions on the Web (TWEB). This was one of the main results in my masters thesis. I uploaded some of the code and documentation related to our TWEB paper to http://webgraphs.recoded.cl .

The second good news was that the extended paper about Straight-Line programs (also with Gonzalo Navarro) was accepted into Fundamenta Informaticae. We included some new results. In particular, we gave an improved version for finding the most common substring in a grammar compressed text. Another novelty is an adaptation of our index to the case when the input is not a grammar but a set of rules and a sequence of (non)terminal symbols. In this case, expanding that last sequence, according to the rules, generates the text.

## SPIRE 2010

I was invited to help in the organization of String Processing and Information Retrieval Symposium (SPIRE) this year. The conference is going to be held in Los Cabos, Mexico. The chairs are Edgar Chavez and Stefano Lonardi.

I’ve been to two previous versions of SPIRE, 2007 and 2008, both were a great experience, interesting presentations and a lot  of fun, except for the fines acquired for driving in Australia :-P.

The call for papers for this year can be found here or here for the plain-text version. The submission is due June 20, and given the schedule there is no room for any deadline extension.

## LATIN 2010

I just got back from LATIN 2010, I was organizing committee and it was a really good experience. We got invaluable help from professors and students from UABJO.

The conference started with quite a few problems. To start with, many people could not make it to the conference thanks to a friendly volcano that left Europe uncommunicated. This forced us to change the conference program and a considerable number of talks were cancelled.

The second problem we had (Monday night) is that the UABJO university workers went on a strike. The conference was being held there and in Mexico you are not allowed to enter the place during the strike. The university gave us support and Alex Lopez-Ortiz managed to move the conference to the Hotel Victoria.

Besides those “little” incidents, I think the conference went pretty well (the recommendation comes from a person close to the organization :p). The talks I got to see were really good and I did really like many of them. The others I didn’t understand that much, but I think that’s normal in a broad conference.

I was also presenting a paper with Jeremy Barbay and Gonzalo Navarro about succinct representations of binary relations.

Those are all the news about the conference. About the place there is too much to tell. To start with, the people are friendly and cheering, which makes Oaxaca a really nice place to visit. I took a tour (after the conference) to some places nearby and got to see some ruins, swim in natural pools, drink mezcal. There is a lot to do and visit around. It would take too much space to cover it all :-).

## The SXSI System

I promised a post about the sexy SXSI system in my last post. Its awesome name stands for Fast In-Memory XPath Search over Compressed Text and Tree Indexes, which says much more about what it does.

The main idea for representing the XML file is as follows: the shape of the tree is represented using succinct balanced parentheses, the labels are represented with structures that support rank/select and access queries, and the text with a self-index for the collection of texts inside the xml file.

Let’s go over each part:

1. The tree: the tree is represented using balanced parentheses, by doing a DFS over the tree and writing a ‘(‘ every time you start visiting a subtree and a ‘)’ when you leave the subtree. This representation allows for very fast navigation over the tree, for a good reference about it see the paper by Arroyuelo, Canovas, Navarro and Sadakane.  In the same order we keep another sequence that corresponds to the ids of each node in the tree, we write down the ids in DFS order. This representation allows to move though the tree fast and access node ids quickly. Besides that, something that is really interesting with this combination is the ability to jump to a node with a given id, we just need to perform a select query over the sequence of tags (recall that select(c,i) over a sequence obtains the position where the i-th c occurs). This last operation allows for traversing the tree only looking at the nodes that have a given id.
2. The self-index, representing every text in the XML tree, is based on the FM-Index. Every node that contains textual data introduces a text into the collection. All the elements are concatenated introducing delimiters and every text in the collection gets an id (in our representation the texts are at the leaves of the tree so it’s easy to map back a forward). The id is given by the position of the delimiter in the BWT, and using a range searching data structure we can map to original leaf-id. This index allows for fast pattern matching among the text data present in the XML file.

This an overview, you can get more details looking the original paper. The nice thing is that by using this structures we allow for a flexible navigation of the structure, and an automaton-based search procedure achieves great practical results over real XML files (more details look at the experimental results in the paper). This makes SXSI an interesting option as engine for representing and querying XML files. The idea of the automaton was formalized a couple of days ago, you can see the paper by Maneth and Nguyen.

What’s next? Well, we are not done, there are some operations we would like SXSI to support that aren’t implemented yet. Another issue is that the structure is static. Making it dynamic in practice is quite tricky, as it usually is with succinct representations.

At this moment we are working on a first release (cleaning code, adding documentation, etc), and we hope to have something soon :-).