Running
[Agent practical 9 of 9]
Now let's have a look at calibrating our model.
There's nothing to stop you calibrating a model by taking an educated guess at the necessary variable/parameter values. Plenty of people do it, usually with some kind of expert input. Equally you can derive some variable/model parameter values from the literature. However, chances are, there will be some values you need to derive by trying various options and assessing how good they at predicting data gathered from the real world.
If you need to go through this automatic calibration (and most models could do with doing so), you pretty much have two choices: you can either try every value of set of variables, or you can pick those most likely to work in some way. The trouble with picking every value is that in non-linear models, like most ABM, variables interact unpredictably, so you really need to try every *combination* of variable values. This can quite easily be more model runs than is feasible.
If you've got 3 variables, each with 100 possible values, you need 1,000,000 model runs to test all possible combinations. The last model run in our Model2.bat took 0.026 seconds on a ok-ish PC, so a complete exploration of the "solution space" to find the best parameter values would take ~7.2 hours. But for 5 variables, it would take 8.2 years, and for 10 variables, 82,445,459,158 years. As the sun will probably swallow the earth in 7,433,000,000 years, you would probably need to factor some kind of space travel into your ABM, if you were considering this route.
More usually, people try to design systems that will intelligently explore the solution space, concentrating on the most likely sets of results -- that is, they use "computational intelligence" to do the job for them. A popular methodology for this is Genetic Algorithms (GA).
Let's have a go at calibrating our model with a GA. First up, we have to have a real system to model. But what to model? How about us!? Let's model a lab full of people reading information at computers. With a little thought about inputs, we should be able to repurpose our core model for this.
For our model, we'll set up a landscape where most of the room is zeros, but in which 26 cells (computers) are full of information; let's say
1320 minutes worth of horrificly complicated Java lectures. Here's a file containing our data:
room.txt. Let's say, just to disconnect us from the model a little, that we've got 10 agents, and let's
give them a maximum time of 3360 minute-long iterations. It's our model, so we know it will start the agents randomly, they'll randomly find a computer
(after which no other agent will try to take the computer because of the neighbourhood) and then they'll sit and consume reading material at a given rate. After a
while (as indicated by our new fullUp
and stomach
variables) they'll need a break, and leave their computers,
restarting near the lab door. They'll then find another (or the same) machine.
Our task, then, is to find an average eatingRate
and fullUp
level that matches up with the real reading and break
rate of our people. We haven't got any real data, so here's some made up data on levels of reading consumption after
3360 minutes: real.txt. We'll use a GA to try a variety of different eatingRate
s and fullUp
levels,
running the model each time, and determining whether the results are any good by comparison with this file.
Ok, so, how do we do this?