So I ended up remembering an old interview question I’ve read about a few years ago. I’m not very fond of those interview questions, but as a problem leading to a solution based on clever use of the techniques exposed in the class this one did fit quite well. Here’s how the question go, if my memory serves me well:
Write a program to print all the natural numbers up to 100 (inclusive) without making use of any form of loop or conditional structure and without writing the numbers by hand.
When I’ve seen this for the first time I got quite a lot of head scratching going on. I thought about crazy ways to do this with the tools I was more used to, the imperative object oriented languages and they didn’t quite feel right. Then it dawned on me that it’d be fairly simple if only I changed the focus to the tools I was used to and decided to try something else — functional programming.
You see, I had recently learned from friend of mine about something called Church Numerals when he tried to explain to me what he had learned in college about Lambda Calculus and the idea behind them inspired me to come up with a solution based on function composition.
The idea itself was very simple, too! All I needed was to codify part of the problem as a function that printed a number and incremented it, and come up with functional composition tools to call that function 100 times. So here’s how the code ended up being written in Scheme:
So that was it and I thought it was a nice solution. I shared it with my buddies on Google + and thought it was finished… except it wasn’t. Speaking with one of my colleagues at work he mentioned that it’d look quite odd in a language without support for closures, so I decided to take a stab at doing it in Java and coming up with “closures” of my own:
But then it was really unfair to Java. I mean, we’re trying to bend it to a solution using a paradigm not supported by the language! I kept thinking and came up with a very simple implementation in Java, although I do consider it cheating:
And why do I consider it cheating, do I hear you asking? That’s because, although we’re not writing any loop construct, the loop is clearly there through the recursion. And, even if we are not making any conditional processing, the compiler puts a test there to test if n is zero, and throw an exception if it is. Of course, since this version “cheats” to have a loop and conditional in disguise, it’s also the most flexible by far. It can print any number supported by the int data type as long as we don’t overflow our stack.
After reading this I figured that, if I was to cheat, I could cheat it in order to make it even simpler. I mean, if the problem only limits that we don’t write loops nor conditionals, we can easily make the compiler write them for us and call it a day. So I whipped out my C++ compiler and fed it the following code:
So, the compiler got quite busy looping over my array and calling the default constructor, stopping when he finished the whole array, but I didn’t write a single loop, right? After that I was too tired to continue coming up with weird implementations but I’m sure there are people out there that can make them quite interesting. I ended up writing an ugly beast in Scheme that did away with all the lambdas using macros, and I’m pretty sure the same can be done with C++ templates although I don’t know how. And I’m sure someone can make it work in C, although I can’t think of a version that doesn’t simulate closures right now. Anyway, that’s all for today, folks!
So, this week I had a dreadful night, I just couldn’t sleep at all. In the search for something to do with a sleepless night I decided to dissect a simple (perhaps the simplest) problem at Project Euler: Problem 1.
The problem is stated as this:
If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below 1000.
Anyways, the idea of this post is to explore this problem, starting from a very naive implementation building up to alternatives, gathering knowledge of the problem and it’s solution, until we reach a good implementation.
So, here I started. Doing it in scheme, I just figured I should do it in a functional way. I ended up with the code below:
(require-extension srfi-1) (define (div-by-3-or-5 x) (or (zero? (modulo x 3)) (zero? (modulo x 5)))) (let ((result (fold + 0 (filter div-by-3-or-5 (iota 1000000 1))))) (format #T &quot;Result: ~a~%&quot; result))
It’s a simple program, obviously. A simple run-down of what it does is given below:
- First the iota function generates a lists that has all the numbers from 1 to 1000000;
- This list is then passed on to filter which returns a list that is composed of all members from it’s input that are multiples of 3 or 5;
- Now, we iterate (through the function fold) on this new list applying + (the sum operator) to it’s contents;
- The result of that is the result of our computation.
Running this program with the help of the UNIX utility time, tells us that our program isn’t exactly the fastest kid in the class:
Result: 233334166668 real 0m2.018s
It takes 2 whole seconds to do it’s job. But when we look at it again, there’s a reason for that. Our code is, right now, very stupid: It creates a whole list of 1000000 numbers in memory; then it creates a smaller list, with thousands of numbers; then it runs through this list adding the values.
We could make it so that the lists aren’t created, using streams (or iterators, or generators… pick your name). So that’s what I did:
(require-extension srfi-41) (define (div-by-3-or-5 x) (or (zero? (modulo x 3)) (zero? (modulo x 5)))) (let ((result (stream-fold + 0 (stream-filter div-by-3-or-5 (stream-range 1 1000000))))) (display result))
I substituted the list-creating primitives for their stream-creating counterparts. Unfortunately as you can see from the results below, either the implementation of Streams from Chicken Scheme is oddly slow or I’m doing something very wrong, because it actually got much slower. I’ll ask the guys from Chicken what am I doing wrong and will fix it later. For now, here are the results:
Result: 233333166668 real 0m5.019s
Of course, doing it the functional way isn’t the only way of doing it in Scheme, and it’s easier to make an iterative version that doesn’t use streams nor builds the whole list. And so I did it, with the following code:
(let loop ((i 3) (acc 0)) (if (&gt; i 1000000) (display (format &quot;Result: ~a~%&quot; acc)) (loop (+ i 1) (if (or (= (modulo i 5) 0) (= (modulo i 3) 0)) (+ acc i) acc))))
Which is simply a loop from 1 to 1000000 adding to an accumulator the sum of all the numbers that are multiples of either 3 or 5. And now we start seeing an evolution in speed! Check the results:
Result: 233334166668 real 0m1.262s
That is a mild improvement, but it’s an improvement nonetheless — the kind of improvement I expected to get from the stream based version. There are some problems with this approach, but let’s try to keep it naive, and deal with a small attempt at optimization.
The target here is that iterating through a million numbers, and permforming two million modulo operations, is bound to be slow. But figuring out what are the numbers we want is simple: we know that we can list all the multiples of a number if we start iterating from this number and add itself to the iteration every step.
That is, if we want to find all the multiples of 3, we have to do 3, 3 + 3, 3 + 3 + 3, and so on. So, so to sum out all the numbers that are multiples of 3 or 5, all we have to do is to sum all the multiples of 3 with all the multiples of 5.
Almost that simple: we are summing some of the numbers twice. You see, 30 is a multiple of 3 and of 5, and thus it’s being added by both sides of the sum. The solution for that is simple, really: We have to figure out what numbers are counted twice, but since 3 and 5 are primes we can guarantee that any number divisible by both, is also divisible by their least common multiple, which is 15.
So our algorithm is:
- Sum all the multiples of 3 from 1 to 1000000;
- Sum all the multiples of 5 from 1 to 1000000;
- Add those two sums;
- Subtract from it all the multiples of 15 from 1 to 1000000.
The code for it is as simple as the algorithm, see below:
(define (compute-sum-for n) (let loop ((i n) (acc 0)) (if (&gt; i 1000000) acc (loop (+ i n) (+ acc i))))) (let ((result-3 (compute-sum-for 3)) (result-5 (compute-sum-for 5)) (result-15 (compute-sum-for 15))) (format #T &quot;Result: ~a~%&quot; (- (+ result-3 result-5) result-15)))
The auxiliary function compute-sum-for makes it much more readable, and the result is much faster than before:
Result: 233334166668 real 0m0.366s
Now that’s the kind of performance we’re after! Right? … Well, no! It’s about three to four times faster than the best previous attempt, I give you that, but we’re talking about computers. They should be faster! From now on I can see two paths to further speed out this code: The naive and the ingenious (which should be apparent to anyone with some knowledge of math by now).
The naive option is simple: We’re in the era of multi-core computers. We have three independent sums going on, why don’t we dispatch them in three different threads and cut the processing time by a third of it?
Well, so I did it with the following code:
(require-extension srfi-18) (define (make-worker n) (lambda () (let loop ((i n) (acc 0)) (if (&gt; i 1000000) acc (loop (+ i n) (+ acc i)))))) (let ((thread-3 (thread-start! (make-thread (make-worker 3)))) (thread-5 (thread-start! (make-thread (make-worker 5)))) (thread-15 (thread-start! (make-thread (make-worker 15))))) (let ((result-15 (thread-join! thread-15)) (result-5 (thread-join! thread-5)) (result-3 (thread-join! thread-3))) (format #T &quot;Result: ~a~%&quot; (- (+ result-3 result-5) result-15))))
We build three threads, using a worker function not much different from our compute-sum-for function from before, start them, and then get their results before doing the final sum. And the speed is…
Result: 233334166668 real 0m0.445s
… slightly inferior! You see, you can’t simply assume adding threads to a computation is going to make it faster. Our computation so far was so simple that adding threads to it gave more in terms of overhead than in terms of performance gain. Keep that in mind.
That leaves us with the only option, finally adding some thought to this and stop being naive. Let’s look at what we’re doing, shall we? We detailed the algorithm before, and it’s basic block is a sum of all numbers in a sequence with fixed increment steps.
It just so happens that we have a name for those sequences. It is called an Arithmetic Progression and as it turns out we know how to sum these!
So we could modify our compute-sum-for function to use this knowledge, like in the code below:
;;;; Returns the sum of the arithmetic progression from start, up to the first ;;;; item which is bigger than the limit (exclusive), with the given step. (define (sum-ap start limit step) (let ((n (+ 1 (floor (/ (- limit start) step))))) (* (/ n 2) ( + (* 2 start) (* step (- n 1)))))) (define (compute-sum-for n) (sum-ap 0 1000000 n)) (let ((result-3 (compute-sum-for 3)) (result-5 (compute-sum-for 5)) (result-15 (compute-sum-for 15))) (format #T &quot;Result: ~a~%&quot; (- (+ result-3 result-5) result-15)))
Just added the sum-ap function which figures out from the parameters start, limit and step how many elements from our progression we have in a given range, and then uses the formula (described in that wikipedia link above) for the sum of an arithmetic progression to figure out the values. Here are the resulting time spent in this computation:
Result: 233334166668.0 real 0m0.011s
And now that’s the speed we’re looking for!
And with that we reach where I wanted to go: Naive implementations of the solution to a problem are, most of the times, great ways to get more insight into the nature of the problem and, therefore, come up with better implementations.
Please give some feedback on whatever you like: on the content, the style, the length and formatting or whatever. I kinda feel this was too long for such a simple post, but it really felt smaller in my head, when I was writing the programs.
Finally I decided to put my hands on it and kick start my blog into life. But what is this all about?
Well, I have to be honest: that truly depends on when you ask me. Only one thing is sure: It’s going to be about computers and programming.
That’s why I chose the name. I’ve been wandering through the world of computing with so many different interests, that I just feel like scheduler: Switching Contexts back and forth among different tasks and ideas.
So I can’t really tell you much about what’s about to unfold here, only the current batch of interests boiling in my mind.
I’ve been diving in the land of Lisp Dialects and Programming Language Interpreters, which means I’ll most likely discuss techniques involved and post reviews for books on the subject. I’m currently writing my own Scheme interpreter (isn’t it like a rite of passage?), so expect to hear about it in the blog, too.
I’m also very interested in the idea of using problems to teach. Taking simple (or not-so-simple) problems and discussing implementations to their solutions, with the trade-offs and ideas involved. To that effect, I’ll probably post discussions on problems from Programming Praxis and Project Euler.
But to be truly honest, that is just how it is today. Tomorrow I might be talking about compilers, build scripts, an interesting algorithm I just found out or whatever. So, welcome to the maelstrom, and I hope you enjoy your ride as much as I plan to.
The main intent behind this is to provide an outlet to my brainstorms and to gather feedback from whoever might be interested. So don’t be shy: join me in the comments section and let’s play this game together.