Sponsored By

GDC 2005 Proceeding: Online Game Architecture: Back-end Strategies

In this GDC 2005 proceeding, Esbensen comments: "Globally, the MMORPG market is expected to reach $3.2 billion dollars in the next two years... Yet, in spite of these numbers, many businesses are hesitant to invest their resources because the high cost of maintaining the necessary infrastructure outweighs substantial profits. How is a company supposed to cope?"

March 10, 2005

18 Min Read
Game Developer logo in a gray background | Game Developer

Author: by Dan Esbensen

The Dark Side of MMORPGs: Infrastructure Gone Mad

Massive Multiplayer Online Role-Playing Games are one of the fast-growing markets in the industry. Globally, the MMORPG market is expected to reach $3.2 billion dollars in the next two years. Forecasters expect that baby-boomers, early retirees, new parents, and women will greatly expand the population of online gamers. Yet, in spite of these numbers, many businesses are hesitant to invest their resources because the high cost of maintaining the necessary infrastructure outweighs substantial profits. How is a company supposed to cope with the growing infrastructure needs of this expected boom?

Currently, the infrastructure solutions available are hardware and network intensive. Successful games like Everquest require hundreds - even thousands - of clustered server boxes working together in what is virtually a redundant network. (A server is defined as a single box containing multiple CPUs. A clustered server, or cluster, is a group of many server boxes working together.) Each server box can handle 200 to 300 players. Given those statistics, 100,000 online players require upwards of 300 server boxes, plus an additional 300 redundant server boxes for fail-over.

Imagine a football field-sized room crammed with whirring, hot computers, and you'll get a picture of what a successful MMORPG requires. The price of air conditioning needed to keep these enormous rooms cooled and the hardware running properly is enough to cause a corporate accountant to tremble with fear at the mere mention of a new online game. Add the cost of dual-processor boxes, networking, other hardware, and skilled personnel to oversee and maintain the architecture, and the profit margin virtually shrinks down to nothing. Yet, such a high-maintenance, high-cost infrastructure system seems to be an unavoidable solution to the growing number of gamers, and game companies have simply accepted the fact that their bottom line profits will be negatively affected.

Because of the online infrastructure costs, big businesses will think twice before investing in the next great MMORPG. Even while subscribers to online games continue to increase at an exponential rate, the solutions available to maintain a burgeoning online game are so expensive that some companies have killed projects in mid-gestation, throwing away millions of dollars spent on the game development because the infrastructure required to run that game would result in a non-profit product. All the while, smaller, independent game companies are virtually excluded from entering the competition at all. And we're talking about a monster market; but, remember - this is a monster that is still in its infancy.

The Independent Game Company's Dilemma

Scores of new online games are developed by independent game companies every year. Many of these games become cult classics and maintain a large following of loyal, appreciative gamers. However, maintaining an online game is a two-edged sword for smaller companies. Often, a company will inform hundreds or even thousands of online players that they will be shutting down a game due to financial problems.

Ironically, it is often the very success of an independent online game that brings forth its ultimate demise. This is because more players require more infrastructure and infrastructure maintenance in order to allow the game to continue to be played properly. A smaller company cannot afford the hardware, the networking boxes, the skilled personnel, or the space to expand the infrastructure and keep up with their success.

So these smaller companies are placed into a very difficult position with the very game that they invested their hearts, souls, time and finances into. They must either sell their pride and joy to a larger corporation or they must kill the game altogether. Or, if they want to hold on to their game, the small company is forced into attempting to use the existing infrastructure while allowing the number of players to grow. But sooner or later, the independent company will find themselves riddled with problems. Not only will the game itself begin exhibiting LAG problems that will cause an unpleasant gaming experience for the players, but the employees will become overworked and most likely leave. What was once a very-likely-to-succeed independent company sadly falls by the wayside.

Since future projections predict an exponential growth in the amount of gamers, the knowledge of the cost of online infrastructure will become a barrier that will prevent small and/or new companies from even attempting to follow their dreams to create the next great MMORPG. The absence of these independent gaming companies in the MMORPG race will result in a loss of competitive creativity.

An Alternative Solution

Until recently, the hardware and network intensive infrastructure described above has been the typical situation. But now there is an efficient way to solve this problem that will not only greatly increase the profits for huge corporations that develop and publish MMORPGs, but will also allow the smaller, independent game companies to join in on the fun. In addition to saving space, conserving electricity, decreasing hardware and networking costs, and raising profit margins, simplifying the back-end architectural design of the infrastructure will decrease lag up to 98 percent. If there is anything that annoys a player - often causing him or her to quit playing a game altogether - it is the aggravating problem of lag (described in detail below) along with the even more annoying child of lag: going link-dead in mid-game. This can result in the player losing a highly desired game or quest item in the process.

We at Touch Technologies, Inc. have applied our knowledge of non-game network surveillance, Internet video surveillance and numerous other infrastructure-based performance technologies to solving the pervasive problem of an online MMORPG infrastructure gone out of control. IGame, the working title of this simple, efficient infrastructure, is a cutting-edge technology that proves that "thinking out of the box" keeps the boxes necessary for maintaining an online game - whether they be hardware or network - cool, cheap, lag-free and happy.

Scalability, Reliability and Speed

By increasing the number of players per server from 200 to over 10,000, the amount of hardware running 24 hours a day is radically reduced. This server reduction allows popular games to become free of server overload instead of becoming sluggish, virtual tombs that are boring to play. Most online gaming aficionados want to experience a game world in which they will interact with other players - otherwise, they would be spending their time playing single-player games. With the implementation of the new highly efficient back-end architecture, using hundreds of servers to handle 100,000 players becomes an archaic notion.

With current technology, players who sign on too late in the peak hours of gaming time are forced to spend their precious gaming time milling around in the adrenalin-lacking, empty world of the "dead" server, while their friends are having a blast fighting huge battles or completing high-end quests.

Anyone who has ever been thrown off a server by going link-dead without warning, only to find that they cannot get back on to the popular server, knows the acute disappointment of being alienated from one's gaming friends. A disappointed gamer will soon be seeking a new venue where he can log on without worrying about rushing home from work or school just so he can get on his favorite server before it gets overloaded during peak hours.

This new technology solves the problems of overworked servers and ensures a highly scalable game. It allows for the creation of game worlds or expansion packs where, at the high end of things, millions of players can simultaneously interact. Imagine the possibilities! A server that can handle 10,000 players will greatly change those games which rely heavily on Player vs. Player or Realm vs. Realm battles to entertain players! MMORPGs will be able to literally create new forms of gaming, giving players the opportunity to indulge in combat and battles on an epic level. Now that's an adrenalin rush!

Even better, the simplicity of this architectural breakthrough actually increases reliability, as it includes built-in, fault-tolerant mechanisms that provide high availability while reducing downtime, which, of course, affects revenue. In addition, because of the elegant simplicity of the technology, infrastructure rollout time is reduced to a fraction of the standard.

Traditional approaches require the set-up of thousands of servers, hundreds of clusters, and a complex, time- and cost-intensive network consisting of switches, routers, and various other bits of hardware. The reduction of infrastructure costs allows for both the preparation of a new generation MMORPG, along with the expansion of existing online games. It is a quick, elegant and financially feasible solution for both large and small companies.

Conquering Lag and Latency

Lag and latency are probably the most irritating problems for players, and they are a devil for programmers to keep at bay. As players demand higher-end graphics and sophisticated eye-candy to furnish their online worlds and facilitate a more immersive gaming experience, lag and latency worsens because these "pretty pictures" can slow down even the fastest CPU to a snail's pace, or even stall it altogether - causing the dreaded "link-dead" scenario.

The common definition of lag in the gaming world is the time that elapses between the event where a player requests an experience and the event where a player sees and hears that experience on his or her computer screen.

For reasons which are explained below, lag occurs when there is a delay somewhere in the process when a "message" is sent by a player's PC, the server acknowledges that message and sends its replying message back to the player who requested the event, as well as to nearby players who are affected by that message. An efficient method for a player's PC to communicate action messages to a server, which will then communicate an "update" message back to the initiating player's PC (as well as other players who are affected by it) is essential for an online MMORPG to exist.

The current infrastructure implemented in most MMORPGs utilizes the following standard to transmit messages between player and server and back again. For purposes of clarity, we will use a simple action of striking an enemy with a glowing sword as an example:

A) An action message comprised of several segments called "packets" is sent by a player's PC to the game server, communicating that player wants to "strike" an enemy with a glowing sword. At this point, the player cannot send further messages until this message - including ALL of the packets that comprise it - is received by the server, processed by the game engine, and an update message sent back to the player. The moment a player's PC initiates an action, the PC is locked up until the above scenario plays out. In our scenario, the requested action message is segmented into four packets.

B) The action message arrives at the server and is placed in a queue with various messages sent from hundreds, or thousands, of other player PCs. The position of a message in the queue will determine how a message will be handled by the game's engine, and when it will arrive there.

C) The game engine is given the action message. The engine deciphers the action message and calculates the necessary information to determine the outcome of the action "strike with glowing sword." The engine will consider many factors, including the statistics of the player and the enemy, the properties of the glowing sword, and any other random factors.

D) In this example, the game engine determines that the opponent will die and communicates this event to the server. Finally, the game server sends the packets of information that comprise the update message back to the player's PC, along with all potentially affected players' PCs, thus confirming the sword strike and kill. The player and his or her fellows will now experience the action as it is displayed on their computer screens, and the player who initiated the action message is now free to initiate further game play.

The above, of course, is a flawless situation. The action message and update message have been sent, received, and returned in a perfectly flowing chain of events. The players are happy because they feel in control of their characters, enjoying non-stop action and complete game immersion.

Real-time gaming, as both gamers and those who work in the industry know, rarely flows so smoothly. Instead, lag and latency raise their ugly, multitudinous heads causing frustrating multi-second delays. Often, after staring at an ineffective screen for what seems like an eternity, one finds him or herself looking at a screen in which a series of events that led up to the death of an enemy are missing. That glowing sword that was won by completing a four-hour quest, the one that the entire guild is ranting about because of the cool moves it incorporated when wielded and struck, is back in its sheath. The enemy is dead, but the player did not witness his character fighting. What happened to the 10-second random, kick-butt parry auto-fight that the player had been waiting three hours to experience?

Simply put, the problem is caused by the less-than-optimal client-server messaging systems that are typically being used by most online MMORPGs. More specifically, the primary cause of lag and latency lies in packet loss.

In the common lag scenario noted above (wherein our player experiences a frozen screen or a series of choppy, blurred movements, loses control of his or her character and is cast out of the paradise of game world immersion), what happened was this:

A) The action message sent by the player's PC must await acknowledgment of the entire message before game play can proceed. When message packets get lost, delays occur. Remember, the server cannot process any part of the action message until it receives ALL the packets. The server knows this and is set to wait a few seconds before timing out on the message. It is the waiting for all the packets to arrive at the server which causes "common" lag. In the latter scenario, one or more packets are lost, the server cannot process any part of the action message, the server times out and sends a message (also comprised of packets) to the player's PC that the action must be re-sent. While all of this going on, the player just obsessively taps on the keyboard or mouse as soon as his or her computer allows. These segmented messages become even more complex as the packets that comprise the re-transmit message can also get lost during the server's attempt to send the "failure" message to the player. When the re-transmission of the packets fails, the player's PC can lock up completely - either causing the player to go "link-dead" or freezing the PC so that it has to be completely re-booted.

B & C) The entire action message is never processed by the server, as one or more of the packets are lost. The action message never gets placed into the queue. Therefore, the game engine does not receive the message and cannot calculate and determine who wins the sword fight.

D) Even if the action message, including all of its segmented packets, successfully passes through the trials of A, B, and C, a second message, the update message, now needs to be sent to the player's PC (along with all other PCs affecting other players). Again, the successful transmission of this message is dependent on the delivery of all the packets. This final step is even more complex, as there may be many player PCs that need to be updated, especially in a large raid. Lag, as well as link-death, frequently becomes more common in highly populated areas or during raids. Often, gamers will "know" that enemies are nearby because they are suddenly inflicted by lag. And we all know that there is nothing more disappointing than going link-dead during a raid, which can cause everything from your entire group being killed off, to being stranded in an area where you are certain to die and lose hard-earned experience.

A 90% Reduction in Protocol Packets Results in a 98% Reduction in Lag

The IGame approach tackles the problem of LAG by elegantly restructuring and simplifying the messaging mechanism while developing a more efficient queuing system. In lieu of a complex message comprised of several data packets, the solution lies in handling data messaging with a single packet filled with both game data and mathematically redundant prior message data. This technology has already been applied to live-feed video imaging for the past four years and is tried and true, just waiting for the opportunity to soar into the great horizon of the online MMORPG industry, where it is much needed.

Let's resurrect the above example and apply the IGame solution to it:

A) An action message is sent from the player's PC requesting the glowing sword to strike an enemy. This action message, unlike the typical scenario, contains only a single packet that consists not only of the current data necessary for this action message, but also with redundancy back-up message data. The redundancy data, which is used to mathematically re-create potentially lost messages, recovers 98% of lost messages without the need to request a retransmit of the message. Of course, the fact that the IGame messaging system eliminates multiple message packets (and the fragmentation that goes hand-in-hand with existing systems) further decreases the likelihood of lost messages. Actual statistics show that 98 percent of lost messages can be recovered through this revolutionary redundancy back-up system.

B) The development of a more efficient queuing system is the second part of this ground-breaking technology. Instead of sending action messages to a queue where it will wait for the associated packets to arrive before taking any action, the new technology contains a receiver component that handles message reconstruction and validation PRIOR to handing messages over to the game engine. This keeps the game engine from getting bogged down by unnecessary messages. In fact, it will only give the game engine access to pristine, validated messages. Besides greatly expediting the processing of the glowing sword strike, the events would be examined for purity. For example, if a player tried to trick the engine into doing something illegal with his new glowing sword (perhaps a bug that has not been fixed yet), the cheat would be nipped in the bud before it found its way into the game engine. Thus, cheating becomes obsolete.

C) To further speed up game engine processing, the new architecture includes its own high-speed database engine. This will significantly reduce the calculation load that the game server must deal with, facilitating the processing of physics, collision detection, and other such matters. The physical, glowing sword strike fight may have already been transmitted to the player PC in a previous redundancy message, thereby decreasing the normal wait time for this action message to update on the player's screen.

D) The update message is sent more quickly. The moment the game engine determines that an enemy is killed by the player with the glowing sword, a single packet is sent only to directly affected PCs, eliminating a significant portion of update messages.

Correcting Faulty Infrastructure and More

Certainly, it would be a shame if faulty infrastructure, message detection and correction, and the obstacles these problems create for both seasoned corporations and up-and-coming independent contenders hindered or even ended the fast-paced progress of next-generation, online MMORPGs. Fortunately, the aggravations incurred by football field-sized rooms full of gaming hardware, low profits, the success-equals-failure dilemma of smaller game companies, and lag-weary players can be solved.

Could it be as easy as looking at the problem from a different perspective and thinking out of the box (or, in this case, the hundreds of thousands of server and network boxes)? IGAME technology has proven this to be true. The equation is a simple one: 1) An increase in messaging packets means more data to process, and more data means that more server boxes are necessary to process such data. (Lower speed database engines require server boxes as well.) 2) An increase in the number of server boxes requires more network connections, which subsequently increase the need for more network boxes, and so on. This inelegant, tangled mass of message packets and equipment is not only complicated, it is EXPENSIVE to maintain. Furthermore, each complication, every one of the potential millions of unnecessary messaging packets, introduces another point for potential failure. As a house built with cards attests, the more complex the system, the more the likelihood for potential catastrophe.

The Bottom-line Summary

The current infrastructure development and maintenance costs of MMORPGs are astronomical. However, the implementation of a beautifully efficient, technologically sound online gaming infrastructure-which optimizes speed and cuts costs by reducing unnecessary data transmissions-is a viable solution. In addition to the big problems of current infrastructure solutions, the new technology optimizes scalability, reliability, infrastructure roll-out time, security, remote management, latency detection and compensation, and connection protocol.

The end result is an advantageous technology that will benefit big corporations while giving independent companies a chance. Now an independent online MMORPG game company will be able to scale up their small, successful game with a minimal investment of hardware, networking, skilled personnel, and space costs, while new game companies will not be discouraged from trying their hands at developing their online dream game.

At the same time, large corporations will be able to substantially increase profits, allowing them to maintain enough skilled personnel so that employees will cease to be overworked and become unproductive, or quit mid-game, because of 80-hour work weeks and burnout. Corporations will be able to spend more money on research and development, pushing the envelope, and creating the next generation of online MMORPGs, the ones that everyone will have to experience. The time for a new look at back-end architecture of online MMORPGs is now. Online MMORPGs deserve an infrastructure cradle in which this gentle giant, this infant monster, can grow without boundaries.

______________________________________________________

Read more about:

Features
Daily news, dev blogs, and stories from Game Developer straight to your inbox

You May Also Like