The Nine Billion names of God

Daniel - Dec 19, 2007 - Literature Code

Arthur C. Clarke wrote a classic SF short story in the 1950s called The Nine Billion names of God, in which he discusses a computer designed and built for a Buddhist monastery to calculate the billions of possible names possible for God.

Clarke imposes a number of conditions upon the possible names:

  • They must be no more than nine characters long1 (whether this includes names of fewer than nine characters is ambiguous)

  • They use characters from an unknown alphabet of an unspecified size2

  • No characters can be repeated three times in succession3

An interesting question is raised: which alphabet do the monks use? Determining the number of characters in such an alphabet, given the above rules, sounds very much like a problem one would expect to find on Project Euler.

There are two noteworthy clarifications which must be confirmed before approaching this problem. First, what do we mean by "Nine Billion"? Most English speaking countries in the present day would define this value as 109 - however we must remember that Arthur C. Clarke wrote this particular short story in the 1950s, some twenty years before Britain adopted the short scale system for large numbers. It is therefore likely that Clarke intended the number to be 1012.

Second, are names of all lengths up to nine characters to be included? The phrasing "not more than nine letters" would seem to imply so -- and as will be shown, it makes little difference anyway for combinations of large numbers of characters, each successive name length being orders of magnitude greater than the last.

The obvious starting point is to determine the maximum number of possible names, without regard for the rule that no character can occur three times successively. This will help determine the minimum alphabet length, and can be expressed simply as follows, where C is the number of possible combinations, and A is the alphabet length:

Names of God

  • When A = 27, C = 7.92 billion
  • When A = 28, C = 10.97 billion
  • When A = 29, C = 15.03 billion

Obviously, none of these results can be rounded to give nine billion; although when it is considered that these results will be reduced by filtering invalid names (i.e. names with three identical successive characters) an alphabet size of 28 characters would appear to be the most hopeful.

Approaching this problem proved to be interesting. I initially attempted it mathematically: if the totals above are considered as a starting point, the objective would be to reduce those totals by the number of possible invalid names. In theory, for every available character, and every available index in the name barring two, there would be an invalid name; if we take the letter A, for example, with X representing other arbitrary characters, the following would be potentially invalid names:

AAAXXXXXX XAAAXXXXX XXAAAXXXX XXXAAAXXX ...

And so on. Thus for a name of size nine, there are seven possible indexes of such triplets -- and for each possible triplet, 266 possible invalid names, given by the number of combinations of arbitrary characters surrounding the triplets. Thus, where C is the number of possible combinations and A is the alphabet length:

Names of God

As might be expected, this solution is inadequate. The flaw is simple - it fails to account for names containing multiple triplets. For example, the name AAABBBEFG would be counted twice using the above method, thus the resulting number of combinations would be lower than the actual number. With regard for the complexity of the mathematics involved and the limited amount of abstraction my brain can handle, I decided to give this approach a rest, and attempted to solve the program algorithmically.

Python is my normal language of choice, however this problem requires vast numbers of iterations and undoubtedly I'd be sitting waiting for weeks if I didn't work with something faster. So I knocked together a brief C script.

I won't post my initial attempt here; suffice to say I would have been waiting a number of days if I had left it to run. I opted to run through each combination (all 10.97×1012 of them) and test for triplets; while this would undoubtedly have worked, brute-forcing a solution was hardly ideal. I spent some time considering how such an algorithm could be optimised, and came up with the following:

  • If a triplet is found, there is no use iterating over every possible permutation containing this triplet. For example, upon reaching AABADDDAA, instead of continuing to AABADDAB, the script should skip to AABADDEAA.

  • The number of combinations of AXXXXXXXX (where 'X' is an arbitrary character) will be exactly the same as the number of combinations of BXXXXXXXX - thus the number of necessary iterations is cut by a factor of A (that being the alphabet length).

  • With the previous point in mind, the number of iterations can be cut further. The combinations starting with a given character, A being the example, will follow only two patterns:

    • The character will be followed by a different character; for example ABXXXXXXX. The total number of combinations here is simply A × (A - 1) × (number of combinations of ABXXXXXXX not containing triplets)

    • The character will be followed by the same character and then a different character, since characters can not repeat themselves sequentially more than once in order to avoid forming triplets. For example, AABXXXXXX. The total number of combinations here is simply A × (A - 1) × (number of combinations of AABXXXXXX not containing triplets)

  • Thus the number of iterations necessary is practically reduced by a factor of A2

  • The logic above works perfectly for names of four of more characters, but fails for three or less. Thankfully, the number of triplets present in the latter category are easily calculable: for names containing one or two characters, there can be no triplets, while for names containing three characters, the number of triplets contained is simply A - for example, AAA, BBB, CCC, etc.

The script I used is as follows, and rarely takes longer than a few minutes:

#include <stdio.h>
#include <math.h>

long long ccounter = 0; // individual name length combination counter

int main(int argc, char *argv[]) {

  // get user provided alphabet length and name length

  int alphlength = atoi(argv[1]);
  if (alphlength < 2) {
    printf("Please provide an alphabet length of 2 or greater\n");
    return 0;
  }

  int namelength = atoi(argv[2]);
  if (namelength < 3) {
    printf("Please provide a name length of 3 or greater\n");
    return 0;
  }

  long long counter = 0;

  // calculate combinations for name lengths 1-3

  int i;
  for (i=1; i<=3; i++) {
    counter += pow(alphlength, i);
  }
  counter -= alphlength;

  int name[namelength];

  // loop name lengths 4+

  for (i=4; i<=namelength; i++) {

    ccounter = 0;

    name[0] = 0;
    name[1] = 1;

    combinations(name, 2, alphlength, i);

    name[0] = 0;
    name[1] = 0;
    name[2] = 1;

    combinations(name, 3, alphlength, i);

    printf("%lld\n", alphlength * (alphlength-1) * ccounter);

    counter += alphlength * (alphlength-1) * ccounter;
  }

  printf("result: %lld\n", counter);

  return 0;
}

// determines combinations for given alphabet length and name length, accepting name array and starting index

int combinations(int name[], int index, int alphlength, int namelength) {
  int i;
  for (i=0; i<alphlength; i++) { // recursively iterate combinations
    name[index] = i;
    if (name[index-2] == name[index-1] && name[index-1] == name[index]) {
      continue; // skip entire subsection of combinations
    }
    if (index < (namelength-1)) { // is the name the right length?
      combinations(name, index+1, alphlength, namelength);
    }
    else {
      ccounter++;
    }
  }
  return 0;
}

  1. "...all such names can be written with not more than nine letters..."  

  2. "...in an alphabet we have devised..."  

  3. "...no letter must occur more than three times in succession..."  

JT

Challenge accepted. Removed recursion and used Perl's internal counting mechanism.

use strict; my $root = "a";

while($root le "zzzzzzzzz") { print $root . "n"; $root++; }

Darren Poulson

I haven't read the story for quite a while, but I do seem to remember that one of the limitations was that a letter could not be repeated more than three times in a row, ie. zzzaaazzz was valid, but zzzaaaazz was not. This may have some effect on the total number of possibilities.

Darren.

Ken Long

9 Billion names, but not all the combinations were valid names. Hence A could well equal 28 or more.

Anonymous

i hate u

pozorvlak

I just modified JT's code to strip out strings with letters repeated four times in a row (print "$rootn" unless $root =~ /(.)1{3}/), then ran some timings and determined it would take over 600 hours to finish. It's possible to work out the numbers using the inclusion-exclusion principle, but it's going to be a messy calculation. Still, nice to know that sometimes pencil and paper can be faster than computing :-)

Name