Page Index Toggle Pages: 1 Send TopicPrint
Normal Topic Hotfix was for lag apparently (Read 1633 times)
arkohighstar
Ex Member


Hotfix was for lag apparently
Nov 22nd, 2011 at 4:28pm
Print Post  
http://forums.ddo.com/showthread.php?&postid=4184565#post4184565

Quote:
I might be able to shed some light on the 'crashes'. This is my conceptual understanding; I'm not an engineer so keep that in mind.

The crashes that the hotfix addressed are not client crashes - so if you were seeing some of these, they won't be fixed with yesterday's hotfix.

For each shard/world there is a machine running a master copy of the server plus many 'instances' of that server. Each instance (basically a copy of the server) handles a certain number of players, spreading the entire load of players amongst the instances. If one of these instances crashes, the remaining instances take on the duties of the failed instance (i.e. handling that player load).

However, if the cause of the server instance crash is not a rare event, more server instance crashes will occur, causing more load to be handled by less CPU power. This results in lag. We have been experiencing the crashes for some time and there was a potential fix in U12. Unfortunately it didn't fix the problem, but it did provide some logging info to help track down the source of the problem (these can be very tricky to trace). The logging identified a problem; it was fixed and went live with yesterday's hotfix.

Note that this fix doesn't mean that all lag magically disappears; there are many potential sources/causes of lag and each one needs to be addressed separately.


I could see the huge lag spikes being us being switched over to a new server as we crash. As for  the crashing, my bet is runaway memory, or bad garbage recycling but thats just a guess
  
Back to top
 
IP Logged
 
Flav
Vault Frog
*
Offline


One Frog to Rule them
All!

Posts: 9984
Location: Land of the Frogs
Joined: Aug 29th, 2010
Gender: Male
Re: Hotfix was for lag apparently
Reply #1 - Nov 23rd, 2011 at 3:52am
Print Post  
Quote:
http://forums.ddo.com/showthread.php?&postid=4184565#post4184565


I could see the huge lag spikes being us being switched over to a new server as we crash. As for  the crashing, my bet is runaway memory, or bad garbage recycling but thats just a guess


I think he over simplified the back end architecture.
I'm sure there's more than one physical server running each world.

Things as I see it you have a cabinet of blade systems ( say 50 servers... ) Each server can handle a given number of instances simultaneously. ( say 100 )
In these 50 servers you have 2 or 3 servers dedicated to database transactions ( they are not part of the instance cluster ), and 2 or 3 more dedicated to coordination between all the servers ( they are part of the instance clusters but don't run actual instances, they are the ones that tell the other servers to spawn instances when necessary... or that move an instance from one server to another. )
Now for reliability purpose, every instance spawned is duplicated on two different blade. That way when the instance crash on one of the blade, the other blade can take the action on the fly while the crashed instance is reset on the relevant blade.

I suspect that some of the lag spikes we encountered ( typical case : when somebody Disconnect ) were caused by the instance crashing and us being 'transfered' on the second blade that ran the instance we were in.
As the blade used in secondary role in the above example is also handling other instances in the primary role when we get transfered ot it because of a crash we get lag, because we are the low priority job.

Anyway, I'm not sure I'm clear...
  

Yes my avatar is an Hermine eating a Greenland Lemming for brunch.
Back to top
 
IP Logged
 
Deathdefy
Waterworks Kobold
**
Offline


I Love Drama!

Posts: 169
Joined: Apr 29th, 2011
Re: Hotfix was for lag apparently
Reply #2 - Nov 23rd, 2011 at 4:21am
Print Post  
Got to say; the weird lag when getting the chests in the first parts of the shroud seems to be completely gone for me. 

Whatever the mechanics are, maybe it worked.
  
Back to top
 
IP Logged
 
arkohighstar
Ex Member


Re: Hotfix was for lag apparently
Reply #3 - Nov 23rd, 2011 at 9:59am
Print Post  
Flav wrote on Nov 23rd, 2011 at 3:52am:
I think he over simplified the back end architecture.
I'm sure there's more than one physical server running each world.

Things as I see it you have a cabinet of blade systems ( say 50 servers... ) Each server can handle a given number of instances simultaneously. ( say 100 )
In these 50 servers you have 2 or 3 servers dedicated to database transactions ( they are not part of the instance cluster ), and 2 or 3 more dedicated to coordination between all the servers ( they are part of the instance clusters but don't run actual instances, they are the ones that tell the other servers to spawn instances when necessary... or that move an instance from one server to another. )
Now for reliability purpose, every instance spawned is duplicated on two different blade. That way when the instance crash on one of the blade, the other blade can take the action on the fly while the crashed instance is reset on the relevant blade.

I suspect that some of the lag spikes we encountered ( typical case : when somebody Disconnect ) were caused by the instance crashing and us being 'transfered' on the second blade that ran the instance we were in.
As the blade used in secondary role in the above example is also handling other instances in the primary role when we get transfered ot it because of a crash we get lag, because we are the low priority job.

Anyway, I'm not sure I'm clear...



no that would describe it fairly well I think.  Whatever the mechanics, it appears that based on his comment that the switching to the back up instance is overloading the server, in other words the failover is enough to keep us from crashing out altogether but the box is not strong enough to stay performant with the extra load.
The real question is whether the whole server is crashing thus dumping all instances at once or are individual instances being dumped out.
  
Back to top
 
IP Logged
 
Balthazar
Puppy Farmer
****
Offline


Time is of the essence!

Posts: 1594
Location: Argo
Joined: Jan 30th, 2011
Gender: Male
Re: Hotfix was for lag apparently
Reply #4 - Nov 23rd, 2011 at 10:57am
Print Post  
Flav wrote on Nov 23rd, 2011 at 3:52am:
I think he over simplified the back end architecture.
I'm sure there's more than one physical server running each world.

Things as I see it you have a cabinet of blade systems ( say 50 servers... ) Each server can handle a given number of instances simultaneously. ( say 100 )
In these 50 servers you have 2 or 3 servers dedicated to database transactions ( they are not part of the instance cluster ), and 2 or 3 more dedicated to coordination between all the servers ( they are part of the instance clusters but don't run actual instances, they are the ones that tell the other servers to spawn instances when necessary... or that move an instance from one server to another. )
Now for reliability purpose, every instance spawned is duplicated on two different blade. That way when the instance crash on one of the blade, the other blade can take the action on the fly while the crashed instance is reset on the relevant blade.

I suspect that some of the lag spikes we encountered ( typical case : when somebody Disconnect ) were caused by the instance crashing and us being 'transfered' on the second blade that ran the instance we were in.
As the blade used in secondary role in the above example is also handling other instances in the primary role when we get transfered ot it because of a crash we get lag, because we are the low priority job.

Anyway, I'm not sure I'm clear...


With some of the new virtualization tech can work you can actually aggregate multiple blades into 1 virtual server, so it could be 1 virtual megga server but using the resources from multiple physical blades. This works both ways, scale up or down.
  

Any fool can criticize, condemn and complain and most fools do.
Back to top
 
IP Logged
 
Flav
Vault Frog
*
Offline


One Frog to Rule them
All!

Posts: 9984
Location: Land of the Frogs
Joined: Aug 29th, 2010
Gender: Male
Re: Hotfix was for lag apparently
Reply #5 - Nov 23rd, 2011 at 11:50am
Print Post  
Quote:
The real question is whether the whole server is crashing thus dumping all instances at once or are individual instances being dumped out.


I suspect it's just the instance ( as I suspect that a given instance is a thread spawned on a given server either as a 'general open' [ Market, Harbor, Houses ] instance or a quest instance. ) that crash. And the Lag spike is generated by the handover to the other server, not by the load... The lag generated by the load is always there and should be constant ( as long as your instance don't crash ).

Balthazar wrote on Nov 23rd, 2011 at 10:57am:
With some of the new virtualization tech can work you can actually aggregate multiple blades into 1 virtual server, so it could be 1 virtual megga server but using the resources from multiple physical blades. This works both ways, scale up or down.


I didn't want to make things too complex...
Several virtual servers over several [ different several ] physical blades all in a cluster. I don't think they went the mega server way with several virtual beasts. Too much capital invested in that mega server. It's cheaper to buy blade chassis and put the number of blades needed... you can then add/remove them dynamically when the load change. Thus maximizing your revenue and minimizing your costs.

The other point being that the former European Servers were hosted at an ATT Datacenter in Netherland... I don't expect CM to go there from UK every week to manage the server hardware. So something that can be soft configured sounds more appropriate. You rent the hardware, when you need to add more you ask for more to be added, when you need less you release some of the hardware to the lender.
« Last Edit: Nov 23rd, 2011 at 11:53am by Flav »  

Yes my avatar is an Hermine eating a Greenland Lemming for brunch.
Back to top
 
IP Logged
 
Balthazar
Puppy Farmer
****
Offline


Time is of the essence!

Posts: 1594
Location: Argo
Joined: Jan 30th, 2011
Gender: Male
Re: Hotfix was for lag apparently
Reply #6 - Nov 23rd, 2011 at 1:59pm
Print Post  
Flav wrote on Nov 23rd, 2011 at 11:50am:
I suspect it's just the instance ( as I suspect that a given instance is a thread spawned on a given server either as a 'general open' [ Market, Harbor, Houses ] instance or a quest instance. ) that crash. And the Lag spike is generated by the handover to the other server, not by the load... The lag generated by the load is always there and should be constant ( as long as your instance don't crash ).


I didn't want to make things too complex...
Several virtual servers over several [ different several ] physical blades all in a cluster. I don't think they went the mega server way with several virtual beasts. Too much capital invested in that mega server. It's cheaper to buy blade chassis and put the number of blades needed... you can then add/remove them dynamically when the load change. Thus maximizing your revenue and minimizing your costs.

The other point being that the former European Servers were hosted at an ATT Datacenter in Netherland... I don't expect CM to go there from UK every week to manage the server hardware. So something that can be soft configured sounds more appropriate. You rent the hardware, when you need to add more you ask for more to be added, when you need less you release some of the hardware to the lender.


You would be surprised at how cheap some of the new blade chassis are. I know the cisco UCS is very affordable(relatively). I dont find it outlandish that they could have full farms of blade servers.
  

Any fool can criticize, condemn and complain and most fools do.
Back to top
 
IP Logged
 
Page Index Toggle Pages: 1
Send TopicPrint