Binary search without division, shifts or multiplications

I subscribed to the daily coding problem a while ago by recommendation from a friend. I only have the free version, just for fun. You get a problem per day, with no solution (unless you pay), the difficulty varies.

Last week I got a problem that asks you to implement a search function that runs in $$O(\log n)$$ time over a sorted array without using division, shifts, or multiplications.

The problem is fun, and at least in my case, my first intuition was wrong. I will go through my solutions and compare them.

Dicemath – a small Go service for keeping kids busy

I know this may not be the most fun activity for kids, but it turns out that my 5-year-old likes to do simple addition worksheets. Now that many of us are working from home with our kids, which is mostly trying to work from home while our kids prevent it, we have to take anything we can to keep the little ones busy and entertained.

I can’t complain. Working from home with two little kids is hard, but the truth is that we are amazingly lucky that everyone close to us so far is healthy, and on top of that, we can work; I’m amazingly grateful for that!

Back to the article, since my son likes to do worksheets where he can count the result of the additions, I built a small website that generates the worksheet. Every morning I print him one sheet, and he goes to either do the exercises or runs away from me because I’ll bug him with something he doesn’t want to do at the moment, win-win.

You can check the site here. I’ll spend the rest of the article explaining how I built and set the site up. It’s a small example you can go through in one day, and it’s always fun to pause things and do something different. You can find the whole project here (no, I did not write any tests; this was a one-day fun project; that’s it).

SICP Problem 2.6

This is a short fun exercise I also went through while reading SICP. The problem gives a definition for zero and a function to compute the next integer (add 1):

(define zero (lambda (f) (lambda (x) x)))

(lambda (f) (lambda (x) (f ((n f) x)))))

SICP Problem 1.28

In this post, I’ll go through some fun I had last week with problem 1.28 from SICP. The problem statement asks you to implement the Miller-Rabin primality test in Scheme. I had not read this book and picked it up because of how many times I’ve seen it recommended. So far, I have to say it has been a fun ride; I’m enjoying it a lot!

I started with that implementation, but then I sort of got into implementing my own ‘prime?’ function (a function that determines whether an integer is a prime number or not). I decided to limit it to implement a function that works for numbers up to 1 billion. This restriction is entirely arbitrary, but it’s big enough to have fun with it.

Important disclaimer: the code in this article is a fun little exploration. If you need to test primality in real life, I recommend looking at this paper before deciding on how to go with your implementation.

Posted in Programming | 3 Comments

Consistent Hashing (Part 1)

Consistent hashing is a well-known method for distributing stuff around multiple servers. We will implement a simple consistent hashing using Python and play a bit with it. This first part just focuses on having a way of computing hashes, we will build upon this in the later parts.

My Old WordPress in docker

This blog was offline for about 7 years. Before this, it was a self-hosted WordPress installation for which I had (somewhat buried in my backups) an up-to-date dump of the MySQL database. Once I managed to find the backup that survived to live in 5 houses, moving between 3 countries, and my messiness, I didn’t have anywhere to load it; the server hosting the blog was not there anymore. I was planning on using a one-click install in DigitalOcean, since I didn’t want to spend time setting up Apache, PHP, and all required things to run WordPress. I was not sure if I could plainly load this old database into a more modern WordPress, so I decided to experiment a bit running things in docker first.

The first step was to load the database into a container; I had a file called fclaude_blog2.sql containing the old dump. I loaded the database (the default authentication plugin is because the default authentication of the latest version of MySQL is not supported, you can also get around by spinning an older version fo MySQL):

docker run --name wp-mysql -p 3306:3306  \
-default-authentication-plugin=mysql_native_password

The default authentication plugin setting is because the default authentication of the latest version of MySQL is not supported. You can also get around by spinning an older version fo MySQL. Once the database was running, I just pushed the dump into it:

exec -i wp-mysql mysql -uroot -pnopass
# (typed 'create database fclaude_blog' and then closed the shell)

exec -i wp-mysql mysql -uroot -pnopass \
fclaude_blog < Documents/fclaude_blog2.sql

Now I could see my data, but I wanted the XML export to load into my new WordPress, and to do so, I needed to run WordPress and connect it to this database.

docker run -ti --name wp -p 8080:80 -e WORDPRESS_DB_USER=root \
-e WORDPRESS_DB_NAME=fclaude_blog --link wp-mysql -d wordpress

Note that this is the settings of the MySQL we just set up, adding the –link option, so this container sees the container running the database.

The most amazing part, at least to me, is that as soon as I accessed http://localhost:8080/wp-admin/ WordPress told me the database was in an old format and offered to upgrade it. It worked perfectly, so now I was ready to log into my blog, and get the file I needed! Except that I did not remember the password anymore!

Following the WordPress documentation, I found that the password in the database was an md5. Per their recommendation, you could write the password in a text file, run the following command, and then delete the file (hopefully).

tr -d '\r\n' < pass.txt | md5sum | tr -d ' -'

I went into the MySQL container and updated the users’ table, setting the password field to the result from that command. And that was it! Then I could export my data and load it into this blog.

After this, I just did:

docker kill wp wp-mysql
docker rm wp wp-mysql

And now, my machine is also not full of stuff I had no intention of permanently installing locally in the first place.

Through all this experiment, the part I’m most surprised by is how WordPress handled the old database. I have no idea what changed, but still, now my database was up to date and working. Kudos to WordPress for this!

Re-launching blog

This blog was offline for a long time, about 7 years! I’ve spent now a bit of time trying to bring it back, with some success. I managed to get all content back, amazingly, the comments with it. The only thing I could not find in my old backups was some of the figures.

I’ll go through some posts to mark where things may not work, but other than that, I’ll just assume they are old enough to be happy with whatever I got back.

So now, for a warm up post, what happened in this last 7-8 years? LOTS! I’ll summarize in a chronological list, maybe I’ll dig deeper into some items later as separate posts:

• Finished my PhD
• Moved back to Chile
• My girlfriend moved to Chile
• Got a position as a professor at Universidad Diego Portales
• Co-Founded an online-backup company called Cloner
• Birth of my son
• Invested time/money in a few startups that didn’t work
• Birth of my daughter
• Got married
• Moved to the US and joined a startup
• Worked at eBay

Those are the highlights. Now currently, I’m living in the Bay Area, and working at LinkedIn.

Programming in Go

I’ve been programming in Go for a couple of months now (as hinted by my previous post). I thought it would be interesting to talk about my experience so far. The best way you can make your own opinion about the language is by trying it.

As a summary, I love Go! I know there are some drawbacks at the moment with respect to C++, yet the benefits I’ve gained by using Go overcome these. Lets start with the benefits:

• Concurrent programming: The goroutines are awesome, you can just launch a lightweight thread by adding “go” in front of a function call. This is so simple that writing concurrent software is simplified amazingly. As I said in a previous post, Go does not provide something new in this sense. Erlang has a similar feature using “spawn”.

• The memory model: You just don’t have to worry about whether stuff is allocated in the stack or the heap. It also has a garbage collector, so you don’t have to worry about freeing the resources when you don’t need them anymore. This simplifies your life quite a bit, yet it has some drawbacks. For example, if you keep pointing to something you don’t need anymore, then the garbage collector does not claim that memory (the gc can guess for you of course). Yet you can think of that as forgetting to free the space anyways :p.

• Multiple return values: A function in Go can return more than one value. Here is a quick example:

func sqrt(i int) (int, error) {
if i < 0 {
return 0, errors.New("Can't compute the square root of a negative number")
}
return int(math.Sqrt(float64(i))), nil
}

The function just return (-1, some error) when the argument i is negative, and (s, nil) when i is non-negative. The value s is the square root of i. What a poor choice for the variable name! π

• Errors: The way you handle errors in Go is very C-like, with two differences though. First, you have to be explicit when ignoring an error. Consider the previous function, we could call it the following ways:

s, err := sqrt(i)
if err != nil {
fmt.Printf("There was an error")
}

or

s, _ := srqt(i)

But we can’t do this:

s, err := sqrt(i)

and then ignore err. Go complains about every single thing you don’t use. So if you don’t do anything with err, it will give you a compile-time error. This is pretty amazing when you read other’s people code. You have all the information right there, it is clear what could go wrong and how that is being handled.

• Channels: channels are part of the language and allow you to communicate between different goroutines. Their usage is amazingly simple and at the same time powerful. Here is a small example:

func genString(c chan string) {
s := generateStringExpensive() // generate string
c <- s
}

func main() {
c := make(channel string)
go genString() // compute the string in a goroutine
var s string
select { // we wait for the goroutine to send the result
case s = <- c: fmt.Println(s)
}
}
• Interfaces: They remind me of Haskell classes. It’s a very simple idea, you define a set of methods and any structure implementing those methods satisfies the interface. You don’t have to be explicit about it.

• Maps: Yes, almost all languages have them. Still I like the defaults and so far, their performance has felt pretty good.

• The tools: The Go compiler comes with a lot of tools. The syntax is so strictly defined that there is a tool that formats the code for you. There are no styles, just one way of writing it (or at least according to this program). The build, install and get commands are great! I like the fact that installing packages from github, googlecode and similar sites is so simple, just a couple of commands (usually just one) and you are ready to use the library you downloaded.

• The library: The library that distributes with Go has a great design. Every time I have to use something from there it’s amazing how quickly you can get it up and running. One great example is the sort package, just take a look.

There are many defaults that make sense in go. You have for example types int, int32, and int64. For storing them you read and write integers of a given size, and when you read you can detect if there is a problem. At least this is a recurrent problem I had in C++, that I would write long to a file in a 64bits machine and then read it in a 32bits machine. Usually I wasn’t careful enough and ended up having problems when reading in the other machine.

Now lets move to the things I don’t like that much, or I miss.

• Generics: I’m missing some kind of generics. Or a better type inference. There is a very basic type inference at this time. I have no clue if this is going to change, but I would love to see a type inference that is more like the one Haskell offers. Anyways, interfaces solve many of the problems one has without generics, so it’s not too bad.

• Memory limitation: Currently there is a memory limitation, your process can’t use more than 16GB. It is possible to fix this (unlike in Java, where you can’t allocate an array with more than 4G-elements). For a quick fix take a look at my previous post.

• Sometimes it feels a bit too strict: I know, the language is designed to enforce that all variables are used, all imports are required, etc. Yet sometimes you end up adding a removing the same import many times. A classical example of this is when you are checking your code, at least I use fmt and log intermittently, and I have to go up and comment them when I don’t use them. Similar things happen because there is no coercion between int and int32 for example. This avoids bugs, but sometimes (when you know it won’t fail) … it kind of bothers. Just to clarify, given a comment in twitter fromΒ Alex Bowe, in general I like these features, it’s just that in some simple cases they kind of feel an overkill (but I can’t think of a good solution to overcome those simple cases, so it’s just a warning that you may have to write or delete more than usual at some points :p)

Anyways, even though there are some things I don’t like that much (at least in certain cases), I’m really happy with Go. If you feel like learning it after reading this post, go here.

Increasing the memory limit in Go

Yeah, it’s been a while, but instead of talking about why I haven’t written anything, I’ll just jump into the good stuff.

I’ve been developing in Go lately. If you haven’t tried it yet, I totally recommend it. Honestly, when you first look at the language, you tend to think that there is nothing new in there. That may be true. Many of the features are not unique, and you can find them in other languages, yet the combination is unique. After one week of using Go I didn’t want to move back to C++ (what I usually use). There are a couple of things I miss from time to time, but the gain is so big that I don’t mind.

The problem. I’ve had a couple of problems dealing with the kind of software I usually implement. Today I found one that did worry me a lot, I can’t use more than 16GB of RAM! After that, my code will just crash with an “out of memory” error. In general this is not too bad, but for the particular algorithm I’m running, I need more than 16GB. Luckily, I found out that the limitation is temporal and that you can lift it up by changing just a couple of lines in Go’s source code. So here it goes.

I will assume you installed Go in $HOME/bin/go (if not, don’t worry, we are going to re-install it, right there). The first step is to get the source for Go: $ wget http://go.googlecode.com/files/go1.0.2.src.tar.gz

Once that finishes, decompress the package:

$tar xvfz go1.0.2.src.tar.gz Now, using your favourite editor, change the following files as explained here: • In file go/src/pkg/runtime/malloc.h Change line 119, where it says “MHeapMap_Bits = 22,” for “MHeapMap_Bits = 25,” • In file go/src/pkg/runtime/malloc.goc Change line 309, where it says “arena_size = 16LL<<30;” for “arena_size = 128LL<<30;” And that’s it, now lets install this. $ cd go/src
$./all.bash$ cd ../..
$mkdir -p$HOME/bin/go
$mv go$HOME/bin/

Now we are ready. Remember to set the environment variable in your .bashrc:

export GOROOT=$HOME/bin/go export PATH=$PATH:$GOROOT/bin You are ready to enjoy 128GB. If you want 64GB, change the 25 for 24, and the 128 by 64. This is supposed to be experimental, so please don’t blame me if things crash. There has to be a reason for that limitation to be there. In my case, I was able to get away with this and run my code. In this particular case is the (expensive) construction of a compressed text index, so the thing runs, outputs, and that’s it. Afterwards I process the output with other binaries. I’m just happy if the program manages to go through the construction, without much care on the resources I’m using at the time. Posted in Programming | 2 Comments SXSI After a while I can happily say that our SXSI system is available for downloading. Kim created a package that can be easily installed and tested. In this post, I include a step by step tutorial on how to build the system and index a sample XML file. First we need to install the dependencies in our system. In this case, I’ll show how to do it in Ubuntu 11.10. $ sudo apt-get install ocaml-ulex ocaml-findlib ocaml-nox libxml++2.6-dev camlp4

$tar xvf sxsi.tar.gz$ cd sxsi/libcds
$make$ cd ../xpathcomp/src/XMLTree
$make clean all$ cd ../../
$./configure$ ./build
$cd xpathcomp This generates a binary called “main.native”. This binary allows you to index and query xml files. Lets first generate an xml file with the standard xmark tool $ ./gen_xml -f 1 > sample.xml

Now, in order to index this file we run:

$./main.native -v -s sample.srx sample.xml "" Parsing XML Document : 84729.9ms Mem use before: VmRSS: 5380 kB Mem use after: VmRSS: 117652 kB Building TextCollection : 189371ms Mem use before: VmRSS: 117916 kB Mem use after: VmRSS: 237264 kB Building parenthesis struct : 303.961ms Mem use before: VmRSS: 237264 kB Mem use after: VmRSS: 237528 kB Tags blen is 8 Building Tag Structure : 4715.13ms Mem use before: VmRSS: 237528 kB Mem use after: VmRSS: 214988 kB Number of distinct tags 92 Building tag relationship table: 1209.822893ms Parsing document: 280366.960049ms Writing file to disk: 115.392923ms character 0-0 Stream.Error("illegal begin of query") Yes, I gave an empty query, but I didn’t want to search anything yet ;-). We have an indexed version of sample.xml with standard options saved as sample.srx. You can play with different indexing options, run main.native without parameters to see the full set of options supported at the moment. The result so far looks like this: $ ls -lh sample.*
-rw-r--r-- 1 fclaude fclaude 188M 2011-10-31 23:10 sample.srx
-rw-rw-r-- 1 fclaude fclaude 112M 2011-10-31 23:02 sample.xml

Finally, as an example, we can count the number of results for the query “/site/regions/africa”:

\$ ./main.native -c -v sample.srx /site/regions/africa
Mem use before: VmRSS:	    5380 kB
Mem use after: VmRSS:	    7492 kB
Mem use before: VmRSS:	    7492 kB
Mem use after: VmRSS:	    7492 kB
tags_blen is 8
11 MB for tag sequence
Mem use before: VmRSS:	    7492 kB
Mem use after: VmRSS:	   27492 kB
Mem use before: VmRSS:	   27492 kB
Mem use after: VmRSS:	   29076 kB
Mem use before: VmRSS:	   29076 kB
Mem use after: VmRSS:	  194604 kB
Parsing query: 0.061035ms
Parsed query:
/child::site/child::regions/child::africa
Compiling query: 0.048876ms
Automaton (0) :
States {qβ qβ qβ qβ qβ}
Initial states: {qβ}
Marking states: {qβ qβ qβ qβ}
Topdown marking states: {qβ qβ qβ qβ}
Bottom states: {qβ qβ qβ qβ qβ}
True states: {qβ}
Alternating transitions
_________________________________
(qβ, {'' })         β ββqβ β§ ββqβ
(qβ, Ξ£)             β ββqβ
(qβ, {'site' })     β ββqβ β§ ββqβ
(qβ, Ξ£)             β ββqβ
(qβ, {'regions' })  β ββqβ β§ ββqβ
(qβ, Ξ£)             β ββqβ
(qβ, {'africa' })   β ββqβ β§ ββqβ
(qβ, Ξ£)             β ββqβ
(qβ, Ξ£)             β ββqβ β§ ββqβ
_________________________________
Execution time: 1.343012ms
Number of results: 1
Maximum resident set size: VmHWM:	  200084 kB

And that’s it. Now you can play further with SXSI π