Archive for the ‘Research’ Category.

## MFCS 2009

I’m back after a while. During this month I took a week off and did a road-trip with Roberto Konow trough Canada. We went to Kingston, Ottawa, Montreal and Quebec City. It was an awesome trip and then life went back to normal with work to catch up.

Now I’m in MFCS 2009, the first day already went by and I’ve had the pleasure to see lots of good presentations :-), that makes my panic mode higher since tomorrow I’m presenting the paper we submitted with Gonzalo Navarro. I’ve already uploaded the slides (and the preprint) to my Web page (http://www.cs.uwaterloo.ca/~fclaude/). Also the hotel in which the conference is being held is great, I’ve already spent around 3-4 hours between the Sauna and the Jacuzzi :-).

I’ll post more about the conference as it goes by and I’ll try to post a small summary of the paper during the night.

## FOCS 2009

The list of accepted papers for FOCS2009 was published a couple of days ago. The page with the abstracts is here and a post on “Graph Theory” included the pdf file for some of the papers (those available).

I was able to find some of the missing papers:

“A Parallel Repetition Theorem for Any Interactive Argument” – Iftach Haitner.

“Optimal Long Code Test with One Free Bit” – Nikhil Bansal and Subhash Khot.

“Submodular Function Minimization under Covering Constraints” – Satoru Iwata and Kiyohito Nagano.

I’ve been checking out the ones that seem more interesting (for me, don’t want to start a war on best papers or imply that other papers aren’t interesting).

One of them is the paper “Space-Efficient Framework for Top-k String Retrieval Problems” by Wing Kai Hon, Rahul Shah and Jeffrey Scott Vitter.  It’s a really interesting paper about a problem we have been looking at for a while (with Diego Arroyuelo, Meng He and Ian Munro) and I found that the solution they propose is a very interesting approach, it allows to handle a wide family of scoring functions for the top-k string retrieval problem.

Another paper that caught my interest is “Models for the compressible Web” by Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Alessandro Panconesi and Prabhakar Raghavan. I couldn’t find the pdf on that one so I haven’t read it yet. It seems to be closely related to my master thesis (yes, it’s written in English) and it would be interesting to see if some of the results there help on explaining in a theoretical way the empirical results we obtained in the thesis.

## Text Partitioning

On optimally partitioning a text to improve its compression
Written by Paolo Ferragina, Igor Nitto and Rossano Venturini

I found this article really interesting and nicely presented. The problem they focus on is:

Problem: Given a compressor $C$ and a text $T$ of length $n$, drawn from an alphabet $\Sigma$ of size $\sigma$. Find the optimal partition of $T=T_1T_2\ldots T_k$ such that $|C(T_1)C(T_2)\ldots C(T_k)|$ is minimized.

This means that we want to cut the text in $k$ pieces, with $k$ unknown, such that applying the compressor $C$ over each piece achieves the best compression possible over the text. This is ignoring possible permutations of the text, such as the Burrows-Wheeler Transform (BWT).

A simple solution is to transform this problem into a shortest path problem, every position in the text is a node in the graph, and every node $i$ is connected with nodes $i+1,i+2,\ldots, n$. The cost of going from node $i$ to node j < i[/latex] is $|C(T_{i,j})|$. It is easy to see that the best partition obtains a total size equal to the minimum path from $1$ to $n$. Here I include a figure of the graph (using IPE :-)). [caption id="attachment_49" align="aligncenter" width="449" caption="Example shortest path for text partitioning"][/caption] The main problem is that just building this graph takes [latex]O(n^3) time. Assume that $C$ takes linear time to compress a sequence, then building the graph takes:

$\sum_{i=1}^{n-1} \sum_{j=i+1}^n j-i =\sum_{i=1}^{n-1}\sum_{j=1}^{n-i} j = \sum_{i=1}^{n-1}\sum_{j=1}^{n-1} j$ $= \sum_{i=1}^{n-1} O(n^2) = O(n^3)$

In the paper they present an algorithm to approximate the problem, they require $O(n\log_{\epsilon+1}n)$ time and achieve an $(1+\epsilon)-$approximation. The main idea behind the approach is to approximate the graph in such a way that, by storing less edges, they can still approximate the cost of the minimum path. They show how to run the algorithm without building the approximated version of the graph, to keep the space consumption low. They also show how to estimate the size of the compression for 0-order and k-order compressors during the process.