Archive for the ‘Random’ Category.

SXSI

After a while I can happily say that our SXSI system is available for downloading. Kim created a package that can be easily installed and tested. In this post, I include a step by step tutorial on how to build the system and index a sample XML file.

First we need to install the dependencies in our system. In this case, I’ll show how to do it in Ubuntu 11.10.

$ sudo apt-get install ocaml-ulex ocaml-findlib ocaml-nox libxml++2.6-dev camlp4

Then, download the package. To decompress and compile the package do:

$ tar xvf sxsi.tar.gz
$ cd sxsi/libcds
$ make
$ cd ../xpathcomp/src/XMLTree
$ make clean all
$ cd ../../
$ ./configure
$ ./build
$ cd xpathcomp

This generates a binary called “main.native”. This binary allows you to index and query xml files. Lets first generate an xml file with the standard xmark tool

$ ./gen_xml -f 1 > sample.xml

Now, in order to index this file we run:

$ ./main.native -v -s sample.srx sample.xml ""
Parsing XML Document : 84729.9ms
Mem use before: VmRSS:	    5380 kB
Mem use after: VmRSS:	  117652 kB
Building TextCollection : 189371ms
Mem use before: VmRSS:	  117916 kB
Mem use after: VmRSS:	  237264 kB
Building parenthesis struct : 303.961ms
Mem use before: VmRSS:	  237264 kB
Mem use after: VmRSS:	  237528 kB
Tags blen is 8
Building Tag Structure : 4715.13ms
Mem use before: VmRSS:	  237528 kB
Mem use after: VmRSS:	  214988 kB
Number of distinct tags 92
Building tag relationship table: 1209.822893ms
Parsing document: 280366.960049ms
Writing file to disk: 115.392923ms
character 0-0 Stream.Error("illegal begin of query")

Yes, I gave an empty query, but I didn’t want to search anything yet ;-). We have an indexed version of sample.xml with standard options saved as sample.srx. You can play with different indexing options, run main.native without parameters to see the full set of options supported at the moment. The result so far looks like this:

$ ls -lh sample.*
-rw-r--r-- 1 fclaude fclaude 188M 2011-10-31 23:10 sample.srx
-rw-rw-r-- 1 fclaude fclaude 112M 2011-10-31 23:02 sample.xml

Finally, as an example, we can count the number of results for the query “/site/regions/africa”:

$ ./main.native -c -v sample.srx /site/regions/africa
Loading tag table: 4.344940ms
Loading parenthesis struct : 304.395ms
Mem use before: VmRSS:	    5380 kB
Mem use after: VmRSS:	    7492 kB
Loading tag names struct : 0.049ms
Mem use before: VmRSS:	    7492 kB
Mem use after: VmRSS:	    7492 kB
tags_blen is 8
11 MB for tag sequence
Loading tag struct : 11.366ms
Mem use before: VmRSS:	    7492 kB
Mem use after: VmRSS:	   27492 kB
Loading text bitvector struct : 11.738ms
Mem use before: VmRSS:	   27492 kB
Mem use after: VmRSS:	   29076 kB
Loading TextCollection : 144.782ms
Mem use before: VmRSS:	   29076 kB
Mem use after: VmRSS:	  194604 kB
Loading file: 478.782892ms
Parsing query: 0.061035ms
Parsed query:
/child::site/child::regions/child::africa
Compiling query: 0.048876ms
Automaton (0) :
States {q₀ q₁ q₂ q₃ q₄}
Initial states: {q₀}
Marking states: {q₀ q₁ q₂ q₃}
Topdown marking states: {q₀ q₁ q₂ q₃}
Bottom states: {q₀ q₁ q₂ q₃ q₄}
True states: {q₄}
Alternating transitions
_________________________________
(q₀, {'' })         → ↓₂q₀ ∧ ↓₁q₁
(q₀, Σ)             → ↓₂q₀
(q₁, {'site' })     → ↓₂q₁ ∧ ↓₁q₂
(q₁, Σ)             → ↓₂q₁
(q₂, {'regions' })  → ↓₂q₂ ∧ ↓₁q₃
(q₂, Σ)             → ↓₂q₂
(q₃, {'africa' })   ⇒ ↓₂q₃ ∧ ↓₁q₄
(q₃, Σ)             → ↓₂q₃
(q₄, Σ)             → ↓₂q₄ ∧ ↓₁q₄
_________________________________
Execution time: 1.343012ms
Number of results: 1
Maximum resident set size: VmHWM:	  200084 kB

And that’s it. Now you can play further with SXSI :-)

Update

Yes, I’ve been lazy when it comes to writing here :-). I decided to go and write a short update on what’s been going on lately before going into my next post, which is going to take me a little bit longer. This post is mainly based on my progress report I’m supposed to be writing. In Waterloo, PhD students have to write a progress report every year telling the school what they did during the year.

The last two months have been quite interesting, I just updated the publications page accordingly. Together with Patrick K. Nicholson and Diego Seco we got a paper accepted into SPIRE 2011 about constructing Wavelet Trees in little space (almost in place). Also, together with Gonzalo Navarro, we are going to be giving a tutorial on space efficient data structures.

Another set of good news came from CIKM, where I’m a coauthor on two short papers accepted this year. One is with Antonio Farina, Miguel A. Martinez-Prieto and Gonzalo Navarro, where we present indexes for highly repetitive collections in two scenarios, when we want to query documents containing a pattern or positions in documents where the pattern is contained.

The second article that went in is a joint work with Susana Ladra, who recently (April) defended her PhD (It doesn’t come as a surprise to anybody that knows her that she did great :-)). This article mixes two know representations and uses the information of the url to split the graph and achieve a structure that supports navigation in both directions, offering the best tradeoff so far.

Finally, to close the set of good news, I was awarded with a U.S./Canada Google PhD. Fellowship in Search and Information Retrieval. This came as unexpected and highly appreciated news, thanks to this I’ll have my funding till the end of my PhD (hopefully). As part of the fellowship I was assigned a research mentor from Google for the next two years. My mentor is Stefan Büttcher, one of the authors of “Information Retrieval: Implementing and Evaluating Search Engines”. He also implemented a search engine called Wumpus. As you can imagine, I’ve a lot to learn from him and I’m looking forward to this experience :-).

And that’s it, now I go back to finishing my progress report due tomorrow.

Project/task management software – Qubity

Today I’m writing about something less technical. In fact, I would say I’m more kind of advertising something: Qubity. Qubity is a project/task management software developed in Chile, some of the people behind this awesome project are good friends of mine, and thus I decided to contribute with a little bit of publicity :-). I’ll try convincing you to try the system, in beta, by just telling you what it does.

Here we go!

Continue reading ‘Project/task management software – Qubity’ »

Disk Crash, Recovering Files and Doing Backups

About one and a half weeks ago I had a disk crash. I didn’t lose anything, but was pretty close, mainly because I deleted by hand an important file :-(.

It is interesting to talk about my disk crash because I faced many problems to bring my computer back. Luckily my notebook has two hard drives, so I’m up and running with the secondary one now, but to do so I had to clone the recovery partition. Then I copied my backups to my home directory, and in the process I deleted an important file, which took me days to recover. And finally, I wrote a small script to back up things. This script is not a program (there is no error checking or anything like that), but it shows how to keep an encrypted backup.

Continue reading ‘Disk Crash, Recovering Files and Doing Backups’ »

Land of LISP and Lazy Evaluation

One of my recent read was “Land of LISP” by Conrad Barski. It’s an unconventional programming book, packed with comics. Most examples are tiny games you can code in a couple of pages. I personally liked the book a lot, it is fun to read and presents a language that, in my opinion, is also fun.

My experience with LISP is quite limited, so most of the things I found in the book were new to me (I knew how to define functions and basic stuff, but nothing about macros, only a couple of built-in functions, etc.). One of the things I liked the most was one of the examples for macros where the author presents a simple solution to get lazy evaluation. In the book, the author codes a game with some weird rules, and I don’t think I would learn much by just copying that example, therefore, I will use the same idea here but with our old friend tic-tac-toe. I have to warn you that the implementation I’m going to post is not a good tic-tac-toe implementation, you can probably find more challenging opponents. The main goal of this exercise is to illustrate the lazy evaluation. Having said that, lets get started. Continue reading ‘Land of LISP and Lazy Evaluation’ »

Some interesting links

Of course, the previous post does not count, so I have to start with something. Here is a list of interesting talks/articles that I’ve seen lately.

Dan Meyer: Math class needs a makeover

Points out some problems in math education, where doing math becomes just pattern matching. Search for the variables you need and replace them in the formula. The proposal is to make them realize the information needed to solve a problem, as in real life, where it’s rare to get exactly the pieces of information you need.

Conrad Wolfram: Teaching kids real math with computers

The main critique to the math education system is that it focuses in calculation (computation) instead of real math. The proposal is to use computers to do the calculation and allow children to get intuition on it by playing with them, seeing for what the things they learn are useful. An interesting detail I found in this talks is that the presenter’s proposal for teaching the basics of calculation, like multiplicating big numbers, can be accomplished teaching the kids how to program that method instead of making the repeat it 10^6 times.

Arthur Benjamin’s formula for changing math education

A 3min short opinion-video. The main idea is to include statistics as the main subject. The main argument for this is that most people only need that in their day to day life (plus the basics, I guess).

Ken Robinson: Changing education paradigms

Those videos are awesome, they really catch you attention and it’s hard to close the video before it finishes.

10 easy ways to fail a Ph.D.

The title explains exactly what the article is about. And in it you can find a link to another article I found really interesting: Productivity tips, tricks and hacks for academics.

 

 

New Web/Blog, Why?

I updated my Web and my blog (it used to be @wordpress). The main reason to do so was to get everything in one place and to move from http://www.recoded.cl to http://fclaude.recoded.cl. The main domain redirects to this page and will continue to do so for a while, since I don’t have time right now to setup another site.

Some of the projects I’m hosting at this domain are starting to attract people to collaborate in them, and since those projects are hosted as subdomains of recoded, I wanted to allow the option of using the root domain pointing to the projects instead of being my personal page. Besides, that makes it okay to also potentially host projects where I’m not involved.

Drawings for Latex

Maybe many of you know many programs for drawing in Linux that allow you to embed the image in a \LaTeX document and at the same time use the math typesettings of \LaTeX. I used to draw in Dia (dia-gnome in Ubuntu), then I devolved to xfig. Ok, lets stop here, I say devolved because I like better dia, not because I want to start a religious fight :-).

A couple of months ago Alejandro showed me a program called IPE (also in Ubuntu repositories). This program is amazing, it is easy to use (and that coming from a really bad user) and it supports \LaTeX in a very friendly way, just type latex into the text fields and then with Ctrl+l you make the transition between the code and the actual drawing, you can see the way the picture looks right away and what you see is exactly what you get.

The only problem so far, I still don’t know how to change a text after typing it, but there is a solution for this (besides the brute-force approach of making the whole image from scratch). When you save the image as .ipe file, the file contains an XML that represents the image and has the source of each text, you can modify it there and then load the image into IPE again.

I’m editing the post to add the link to IPE, as Victor suggested:

http://tclab.kaist.ac.kr/ipe/

Wireless Card Linux/Windows

I moved to my new place last month and the second thing I got was the Internet connection (first was the bed). My new apartment is small but has a *really* bad distribution of all the plugs in the walls. Since I didn’t have any good way to connect with a wire my computer to the router, I bought a wireless card. The card is a TRENDnet TEW-421PC/423PI, which is really cheap and it’s supposed to work fine. My apartment is so small that it shouldn’t give any problems, yet it did. The connection did drop randomly from time to time, as always, the drops did occur on the worst moment possible. It did happen in Linux (Ubuntu 9.04 32 bits) and Windows XP, but my notebook was working fine (so I assumed it was the card and not the router). When I was about to take the card back to the place I bought it, I found a page that explained that you could update your driver for the card using windows update. That did work perfectly, and then the problem for Linux was remaining. What I did was to install ndiswrapper with the drivers that came with the card and then copy the file rtl8185.sys from the folder system32 (in Windows) to /etc/ndiswrapper/net8185/. Now my Linux works with the new driver too, until now everything seems to work fine.

Google Reader

A couple of weeks ago I discovered Google Reader. By this time I think I’m almost selling my soul to Google: Gmail, Google Calendar, this blog, and now, Google Reader. Still it’s a really great application, it saves me from checking lots of pages that I visit almost every week and sometimes don’t find anything new or too many new things that I don’t know where to start. Now I just visit Google Reader every day and that’s it.

Ok, what’s this post about? I’m not writing the post to promote Google Reader itself, but to list the subscriptions I find more interesting :-). Here’s the list:

– http://www.wikicfp.com/cfp/rss?cat=algorithms
– http://blog.computationalcomplexity.org/feeds/posts/default
– http://export.arxiv.org/rss/cs
– http://mysliceofpizza.blogspot.com/feeds/posts/default
– http://www.phdcomics.com/gradfeed.php
– http://terrytao.wordpress.com/feed/
– http://www.mdpi.com/rss/journal/algorithms/

And you can also add my blog :P http://fclaude.recoded.cl