Chapter 1: Introduction to Simulation

Computer simulation of biological phenomena is an important and growing part of biological research. The goal of Simulating Life is to show you how useful simulation is in understanding biology. You will be using the graphic programming language StarLogo Nova (SLN) to design, construct, and evaluate your own simulations of biological phenomena using autonomous agents. SLN allows you to simulate many aspects of biology—agents may be chosen to represent molecules, cells, or organisms—and the model scenario could be virtually anything, including enzyme kinetics, cells growing in culture, cells interacting in an organism, or organisms interacting with each other and their environment. You will also have the opportunity to try using a more powerful version of Logo, NetLogo for some of your work.

Goals of the course:

• Gain a deep understanding of biological content and the connections among different fields in biology. To do this, you will be:
• Extracting and programming the ‘rules’ of a biological system
• Comparing models with peers to find similar ‘rules’ in different systems
• Gain experience with hypothesis testing and the use of simulation in science. In doing so, you will learn how to:
• Make use of the ‘Simulation Cycle’: hypothesize / collect facts / build model / test model / modify model / frame new hypothesis.
• Explain and justify the assumptions necessary to build models.
• Use your working simulation to test hypotheses about the phenomenon you are modeling.
• Make connections between and find synergy among mathematics, computer science, and biology.
• Develop ‘systems thinking’, and understand the ‘emergent phenomena’ that arise when agents interact.

This is primarily a course designed to get you thinking rigorously and deeply about biology. You will have to think deeply when you choose a system to simulate (What part(s) should I leave in? Which should I leave out? How do these relate to my goals in making the simulation?), when you design your agents (How do I get them to behave as I want while using biologically-reasonable processes?), when you debug your simulation (Is this working? How can I tell? How can I figure out what's wrong?), and when you use your simulation to explore the phenomenon you're studying. You'll demonstrate this thinking as you talk to me and your fellow students, in your posts, and in your final project and paper.

This on-line textbook will show you the basics of programming in SLN (and NetLogo) through a series of projects that will give you practical experience with programming and simulation. The series of exercises will also give you ideas for your own projects.

1) Why make simulations?

"All models are wrong; some models are useful."

This quote, attributed to the statistician George Box, is a central motto for this class. Simulations are models of reality: they always incorporate some realistic elements as well as some non-realistic elements. In this sense, they are all 'wrong'. However, even though they are approximations to reality, they can often be useful. They key is to always bear in mind the 'right', 'almost right', and 'wrong' parts of your model and what you can conclude, given these limitations.

Why simulate biological phenomena?

Simulations are less costly, both in terms of time as well as resources, than real life. You may choose to build a simulation of thousands of organisms acting on a time scale of years. The real life version of this would be very expensive, the data collection would be difficult, and a few minutes of simulation time would correspond to years of real life observation. At the smaller scale, you may choose to build a simulation of molecules which would be difficult and expensive, if not impossible, to observe in real life.

You can conduct 'what if?' experiments easily. What if I had 10-times as many fish in the pond? What if I increase the chance of the chemical reaction? What if there were only grass, rabbits, and wolves in this ecosystem? Questions like these are difficult, if not impossible to explore in real life but are trivial to explore in a simulation.

To build a simulation, you really need to understand what you're simulating. Building a simulation requires you to spell out all of the details of how your agents will act. To do this, you will need to learn about the system you are simulating in great detail. This deep analysis will reveal gaps in your knowledge as well as gaps in what is known about your system. Exploring these gaps will help you understand biology better and has been used to drive research in the real world.

Simulations can be beautiful. The simulation of fish schooling in response to environmental conditions was created by Kelsie Becker when she took Bio 362. She has degrees from both Mass College of Art and UMass Boston's Biology department.

Programming is fun.

This is from The Mythical Man Month by Frederick Brooks (1977).

Sadly, the sexist terminology wasn't appropriate even in 1977 as many of the pioneers of computer programming were women (see especially Grace Hopper); that aside, this captures much of what I love about programming:

Why is programming fun? What delights may its practitioner expect as his reward?

First is the sheer joy of making things. As the child delights in his mud pie, so the adult enjoys building things, especially things of his own design. I think this delight must be an image of God's delight in making things, a delight shown in the distinctness and newness of each leaf and each snowflake.

Second is the pleasure of making things that are useful to other people. Deep within, we want others to use our work and to find it helpful. In this respect the programming system is not essentially different from the child's first clay pencil holder "for Daddy's office."

Third is the fascination of fashioning complex puzzle-like objects of interlocking moving parts and watching them work in subtle cycles, playing out the consequences of principles built in from the beginning. The programmed computer has all the fascination of the pinball machine or the jukebox mechanism, carried to the ultimate.

Fourth is the joy of always learning, which springs from the nonrepeating nature of the task. In one way or another the problem is ever new, and its solver learns something: sometimes practical, sometimes theoretical, and sometimes both.

Finally, there is the delight of working in such a tractable medium. The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures. (As we shall see later, this very tractability has its own problems.)

Yet the program construct, unlike the poet's words, is real in the sense that it moves and works, producing visible outputs separate from the construct itself. It prints results, draws pictures, produces sounds, moves arms. The magic of myth and legend has come true in our time. One types the correct incantation on a keyboard, and a display screen comes to life, showing things that never were nor could be.

Programming then is fun because it gratifies creative longings built deep within us and delights sensibilities we have in common with all men.

2)Numerical vs. Agent-Based Simulation.

Different kinds of simulations.

In this course, we will be using agent-based simulations. These are simulations where you will program individual agents (seals, sharks, molecules, proteins, people, etc.) with particular behaviors and then turn them loose in your world to see what happens. This is not the only way to simulate biological phenomena. To make it more clear what we're up to in this course, I will briefly discuss alternative ways to simulate, for example, bacterial growth.

Bacteria 'grow' not by getting larger but by increasing in number. Each time a bacterium has enough raw materials, it divides into two daughter cells. Thus, the population doubles with each cycle of reproduction. First, you have 1 cell, then 2, then 4, then 8, then 16 etc. This is called exponential growth because it can be modeled by an exponential equation which we will see in the next section.

Equation-based numerical simulation. The increase in population can be described by the following equation:

N is the number of bacteria; k is the growth rate; and T is the time: $$N = e^{kT}$$
You can plot a graph of this equation and you'll get something like this:
This gives exact numerical results but only works for situations that can be described by an equation. Many systems can be described by an equation and researchers are developing more sophisticated equations all the time but the number of biological systems that are well descirbed by equations is limited.

Step-by-step numerical simulation. Another way to express the growth rate is to say that each bacterium has a certain chance, k, of reproducing in each second of time. Since each bacterium divides on its own, the number of newborn bacteria in each second will be k-times the number of bacteria. This can be written in a differential equation: $${dN \over dT} = kN$$
You could model this in a program like Excel where you calculate the number of bacteria present in the next time step based on the number currently present and k. You'd get a graph like the one above. This also gives exact numerical results but is similarly limited to things that can be described exactly by an equation.

Agent-based simulation. This is what we'll be doing in this class. In this case, you create a finite population of agents and give each of them a fixed chance of reproducing in each 'time step'. For each time step, the computer 'rolls the dice' using the growth rate "k" for each agent one-by-one and decides if it will reproduce in that time interval or not. If an agent reproduces, a new agent is added to the population. This also yields exponential growth but the line isn't so smooth because the population is finite and the random fluctuations are therfore more visible. One example of exponential growth from the project "Predator Prey" (which you will build later in the semester) is shown below:

3) Important Features of Simulations in this Course.

Here are a few other guiding principles for our work this semester:

Simulations must be 'hands off'.

Although our simulations will have some control buttons (more on that later), it is very important that our simulations be 'hands-off'. That is, once you start the simulation, the agents themselves must control the action. So, for example, if you have agents that need to hunt other agents, the the agents themselves have to target the prey - you cannot point them at prey using the keyboard or mouse. The simulations, once started, should run themselves.

Simulations not re-enactments.

This is best explained by an analogy to a stage show. A re-enactment is like a play: all the action is completely scripted. Each actor knows exactly where to stand, when to move, and what to say at each step in the process. We will not build things like this as the outcome is absolutely determined - the play always ends the same way. A simulation is more like improvisational theater - each actor is given a personality and a set of tendencies. The action comes about as they interact based on their personality. The show has no director and will therefore be a little different each time. We will program agents with 'personalities' and then observe what happens. Sometimes, the result can be quite surprising - and that is where you learn the most.

Simulations must be biologically reasonable.

This is very important for building useful simulations and also a place where a lot of learning about biology happens. Here are a few examples of what I mean:

Cells and molecules can not see. They don't have eyes so they can't point themselves towards or away from other agents, period. We teachers are often sloppy and make it sound like enzymes hunt after substrates but, in truth, they just bump into each other randomly. Cells can chemotaxe towards or away from molecules but they use a very different method than animals with eyes do (which we can model in SLN). Only things with eyes can hunt.

There is no god. I don't mean this theologically. I mean that there is no agent with a global view and ability to act globally. So, for example, you can't have a simulation of the endocrine system automatically delete all the insulin agents if the number of insulin agents gets above 1000. Technically, you can code this in SLN, but you may NOT because there's no way a body can count all the insulin molecules nor is there a way for the body to simultaneously delete them all. In real life, it can be modeled using 'half-life ' which we'll meet later on. Agents must act locally, not globally.

Very few of our agents have built-in timers. For the most part, organisms, cells, and molecules measure time by the accumulation or depletion of molecules and/or by random chance - not by some internal clock. There are exceptions like circadian rhythms, gestation times, seasons, and, somtimes, the course of diseases, but these are relatively rare. Only people carry watches.

Setting some basic assumptions.

You will need to make some decisions, typically at the beginning of your project, to keep your project managable and realistic. These include:

Know where to draw the line. Technically, if you were simulating lions eating gazelles, you could simulate the zillions of individual atoms in each of the animals but this would be tedious to program and, more importantly, not necessary if you're interested in predator-prey relations. Likewise, if you're simulating a biochemical pathway, it is not necessary to simulate the organism in which it is acting. You'll have to decide how far up and down the scale your simulation will extend. There are no rules for this; it depends on the questions you're asking. I will help you with this.

Be careful about time. As we'll see, these simulations are based on 'ticks' of the simulation clock. It is up to you to decide whether each tick is a nanosecond, a second, a day, or a century. There are no rules here either. However, it is important to be consistent. For example, if you're simulating people in an epidemic and they take one step each tick, then a tick represents a few seconds of real time. That then means you should think carefully about how big your world is - in a crowded world, a person might bump into 10 people in 100 ticks and you need to decide if that's reasonable.

4) Some tips for success.

Learning to code

Students come to this class with different levels of coding experience so I've designed the exercises to work for everyone. They provide extensive support for newbies that you can choose to use or not use as you gain skill and confidence as a programmer. It's a little like learning to ride a bike without training wheels - you can gradually take them off as you get steadier. Here's one way to think about your progression. Everyone will get to the highest level by the end of "boot camp" but each of you will take your own path. I and your classmates will help you along the way so that everyone makes it to level 4.

• Level 1. This only applies to NetLogo. In this case, you can download working code for each project directly to your computer and run it. It is guaranteed to run. If you're working at this level, my minimum expectation is that you will 'play' with the code - make changes and try to figure out how it works.
• Level 2. Look at the screenshots in the instructions and duplicate them exactly in your own projects. These should work if you copy them exactly. Here, my minimum expectation is that you get them working and play a little to understand how it works.
• Level 3. First, look at my description of what the code should do; then, try to build some code without 'peeking at the answer'. Test your code and adjust it based on what it does (or doesn't do) and the screenshots of my code.
• Level 4. Start coding based on what you want the code to do and then debug until it works. Your project will be someting that nobody has ever done before so there's no way to look up the right answer - this is when you really 'take off' in the class. You will be the expert on your project - not me.

Some phrases to live by in SimLife:

• Always be coding. The projects in this class are BIG - they work best if you chip away at them bit by bit.
• Ask for and give help. We've all been newbies at one time or another and we'll all develop skills that can help each other. I and your classmates are here to help.
• Get up and walk around if you get stuck. Sometimes, just standing up will rattle your mind and you'll see a solution to something that had you stumped.
• Try it; you can't break it. It is not possible to do permanent damage to your computer no matter what you code in SLN. This gives you license to try anything since you can't break it. If you're going to try something radical, use SLN's "remix" or NetLogo's "save As.." to keep your original version safe. If you wonder why something is the way it is, how something works, what would happen if you changed something - TRY IT! It will help you learn; it's all good.
• There's (often) more than one way to do it. Often, you can think of more than one way to write code to do a particular thing. Some are simpler, some are easier to understand or explain, some are more error-prone, and some allow easier extension to new behaviors. The choices will be up to you with guidance from me and your classmates.
• How could this be fooling me? Lots of times, simulations look like they're working when they really aren't. Test fiercely at every step.