Algorithm correctness

There is well known story, about a math teacher in the elementary school. He got mad and told the kids:
– Find the sum of the numbers 1 to 100! I want the answers by the end of the class!

Most of the kids started summing of the numbers as follow:

1 + 2 + 3 + 4 + 5 + ... + 100

Probably if they had a computer they would write simple program which calculates the sum of the numbers of 1 up to 100.  The program would look as:

var sum = 0;
for (var i = 1; i <= 100; i++) {
    sum += i;
}

They constructed an algorithm with linear complexity – О(n). Obviously with this approach is quite tedious to solve the exercise on the paper.

Twenty seconds after the teacher gave the question, one kid give the correct answer. The name of the kid was Carl Friedrich Gauss – one of the greatest mathematician of all the time.
Gauss rearranged the numbers in pairs of sum 101:

100 + 1 = 101
99  + 2 = 101
98  + 3 = 101
 ... 
50 + 51 = 101

He noticed there are 50 pairs with sum of 101 and therefore the right answer is: 50 * 101 = 5050. The Gauss method is definitely faster both on paper and in form of computer program, even though the multiplication complexity is not O(1), in modern computer it’s reduced as much as possible.

Now suppose you have to write function which accepts an integer N greater than 0. It should return the sum of 1 to N inclusive.

The first algorithm apparently returns correctly the sum of the first N positive integers. The problem with it, it’s not effective, especially for big N.  We should try to apply the Gauss method. The function would be:

function sum(n) {
    return n * (n + 1) / 2; 
}

The code seems pretty good, but how we can be sure it works correctly for any N? Unit tests are good, but they can test only for partial values of N. If the tests fail for particular value, we can consider the approach as incorrect.

In  the our case the code work correctly for example:

sum(5); //15
sum(10); //55
sum(100); //5050

These tests return correct result, but there is no way to pretty sure that code would work with every value of N. We should use more sophisticated approach.
Mathematical induction is pretty good variant to prove the formula is valid for every N. With the method of the mathematical induction, firstly we have to prove the correctness of the formula for initial or base step of the induction. If it’s true, make inductive hypothesis for N, then using the hypothesis we will try to prove this formula is true for N+1. If we could prove it, then it’s true for every N.

induction

So we proved the method of Gauss is valid for every value of N. Mathematical induction and strong mathematical induction are helpful tool to prove the correctness of particular algorithm. Especially the strong induction could be used to prove of more complicated and recursive algorithms.
The theory of program correctness is not start and finish with the method of induction.

Reference:

Discrete Mathematics and Its Applications

 

Faster String#repeat method, with ancient Egyptian theory

In the upcoming JavaScript standard ECMA-262-6 will have String.prototype.repeat method. You can check the current support of ES6 in the major engines in the kangax table: http://kangax.github.io/es5-compat-table/es6/

As every new feature of the language, the developers looking for polyfill  in older engines. The operations with strings are limited and we can’t directly multiply the string with a given multiplier. We could achieve repeat method, with additive operation or in the language of strings, with concatenation. So, the obvious implementation:

function repeat(str, times) {
    var s = '';
    while (times-- > 0) {
        s += str;
    }
    return s;
}
repeat('a', 5); //aaaaa

Or more idiomatic approach for JavaScript:

function repeat(str, times) {
    return Array(times + 1).join(str);
}
repeat('a', 5); //aaaaa

Both approaches have linear complexity O(n), which for small values of the multiplier is not a problem. The problem comes when the multiplier is bigger. Suppose you have to repeat a certain string in your text editor N times. How would you repat the string? According the algorithm above you will copy the string and paste N times. This is not only ineffective but also and quite boring. Try to repeat 'foo' 20 times.

The multiplication in ancient Egypt

The ancient Egypts has been quite intelligent and advanced in the math.  When they have had to multiply big numbers they are used fast algorithm for that time. Consider they have had to calculate 13 * 21. From the left side they put 1 on the right side the value of the multiplicand. On the next row they doubled the values from previous while the value in first row is less than the multiplier.  Let me illustrate with <code>13 * 21</code> :

   | Multiplicand 
------------------
1  | 13 
2  | 26
4  | 52
8  | 104 
16 | 208

After that they use the values from the first column for which the sum is equal to the multiplier. In the our case:

   | Multiplicand 
------------------
1  | 13 
4  | 52
16 | 208
------------------
21 | 273

And the sum of the second column is the result of multiplication operation. So, 13 * 21 = 273.

What we can noticed with that example? The values in the first column are powers of 2. Obviously, the Egyptians are discovered how powerful is the binary system. What is more important for us, they have multiplied with logarithmic complexity using only additive operation.

Turning in code

While on the paper is easy for us to pick up the right values from the first column, when we implement the code we need to have right criteria which values to use, and which should be discarded.

We will add two new columns to the Egyptian algorithm. One with multiplier and one with the value of modulo division of the multiplier with 2. The difference is that we will not doubled the multiplier on each row, but we will integer division with 2. Let see again the example with 13 * 21.

   | Multiplicand | Multiplier | Multiplier mod 2 
--------------------------------------------------
1  | 13           | 21         | 1
2  | 26           | 10         | 0
4  | 52           | 5          | 1
8  | 104          | 2          | 0
16 | 208          | 1          | 1

It is easy to see we should add the value in the second column to final result only if the result of modulo division in the 4th column is 1.

Turning this in code is definitely more easy and we can implement the algorithm as:

function repeat(str, times) {
    var res = '';
    while (times > 0) {
        if (times % 2 == 1) {
            res += str;
        }
        str += str;
        times >>= 1;
    }
    return res;
}

repeat('a', 5); //aaaaa

This algorithm is with O(log n) complexity and the difference with the previous two is really big especially with bigger multipliers.

This algorithm is not applicable only on strings. It can be used for an object which supports concatenation operation. Therefore in JavaScript can be used to repeat the values of array.

In my LZ library I implement this method to support and arrays: https://github.com/abozhilov/LZ/blob/master/lz.js#L996

E.g.

lz.repeat('a', 5);  //aaaaa
lz.repeat([1], 10); //[ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ]

Conclusion

This is just an example how the good theory can be used for better and faster algorithms. Hope ancient Egyptians like JavaScript and the implementation above.
You can test the approaches in jsPerf – http://jsperf.com/faster-string-repeat