Procedural Generation of Names

One of the things that I mentioned a few times on my list of content to be generated was Names. Be it the names of the Galaxy itself, of Species, of individual Stars, or whatever. Names are important, and they give a very strong feel for the thing being names. And this is even more important in a text based game.

My name generation is going to be based on Markov Chains. These are a very simple, yet powerful way to generate reasonable sounding language that is also very random. Essentially it works in two phases – building up of the initial data, and generating values from it.

Building up of Initial Data

The building up of the Initial Data is done by feeding in a set of pre-existing strings. We then pull these strings apart to see how they are constructed. This is done in two sections.

Firstly, we extract the stem off of each string. This is the first n characters of each string, where n is our “prefix length”.

Note – the prefix length is very important. A longer prefix length means you have more coherent results, but there is going to be less variation. A shorter prefix length will give more variation but less coherence. The sweet spot is somewhere in the middle.

Secondly, we build a map of “prefix” to “next character”. We do this by iterating over every single string, from the start, and for every n character prefix we map it to the next character.

That sounds complicated, so let’s go over a real example. Take the input strings “alice”, “bob” and “carol”, and a “Prefix Length” of “1”:

  • Our prefixes are “a”, “b” and “c”
  • Our mappings of prefixes to characters are:
    • For “alice”
      • a -> l
      • l -> i
      • i -> c
      • c -> e
    • For “bob”
      • b -> o
      • o -> b
    • For “carol”
      • c -> a
      • a -> r
      • r -> o
      • o -> l

This gives is an overall set of chains as:

  • a -> l, r
  • b -> o
  • c -> e, a
  • i -> c
  • l -> i
  • o -> b, l
  • r -> o

Generating Values from Data

Once we have our initial data, we can generate names from it. In order to do this, we need a decent random number generator, and our input data. The process is then:

  • Randomly generate the length of the name
  • Randomly select a prefix from our prefix list
  • Until the generated name has reached our length
    • Find the set of next characters from the last n characters that we have so far generated
    • Randomly pick one of these characters and add it to the end

At the end of this, we will have a new string that roughly follows the rules that our input strings followed.

Using our data from above, let’s generate a name of 3 characters long.

  • The name starts with one of “a”, “b” or “c”. We randomly picked “c”
  • The next letter is one of “e” or “a”. We randomly picked “a”
  • The next letter is one of “l” or “r”. We randomly picked “l”
  • We’ve now reached the end of the string, and generated “Cal”

Note that “Cal” doesn’t actually appear anywhere in the input data, but we can look at it and it doesn’t seem unreasonable.