Vault - Hotfix was for lag apparently

Vault › General › DDO-Specific Discussion › Hotfix was for lag apparently

(Moderator: Strakeln)

‹ Previous Topic | Next Topic ›

Page Index Toggle

Pages: 1

Normal Topic

Hotfix was for lag apparently (Read 1633 times)

arkohighstar Ex Member	Hotfix was for lag apparently Nov 22^nd, 2011 at 4:28pm	Print Post
	http://forums.ddo.com/showthread.php?&postid=4184565#post4184565 Quote: I might be able to shed some light on the 'crashes'. This is my conceptual understanding; I'm not an engineer so keep that in mind. The crashes that the hotfix addressed are not client crashes - so if you were seeing some of these, they won't be fixed with yesterday's hotfix. For each shard/world there is a machine running a master copy of the server plus many 'instances' of that server. Each instance (basically a copy of the server) handles a certain number of players, spreading the entire load of players amongst the instances. If one of these instances crashes, the remaining instances take on the duties of the failed instance (i.e. handling that player load). However, if the cause of the server instance crash is not a rare event, more server instance crashes will occur, causing more load to be handled by less CPU power. This results in lag. We have been experiencing the crashes for some time and there was a potential fix in U12. Unfortunately it didn't fix the problem, but it did provide some logging info to help track down the source of the problem (these can be very tricky to trace). The logging identified a problem; it was fixed and went live with yesterday's hotfix. Note that this fix doesn't mean that all lag magically disappears; there are many potential sources/causes of lag and each one needs to be addressed separately. I could see the huge lag spikes being us being switched over to a new server as we crash. As for the crashing, my bet is runaway memory, or bad garbage recycling but thats just a guess


	IP Logged

Flav Vault Frog Offline One Frog to Rule them All! Posts: 9984 Location: Land of the Frogs Joined: Aug 29^th, 2010 Gender:	Re: Hotfix was for lag apparently Reply #1 - Nov 23^rd, 2011 at 3:52am	Print Post
	Quote: http://forums.ddo.com/showthread.php?&postid=4184565#post4184565 I could see the huge lag spikes being us being switched over to a new server as we crash. As for� the crashing, my bet is runaway memory, or bad garbage recycling but thats just a guess I think he over simplified the back end architecture. I'm sure there's more than one physical server running each world. Things as I see it you have a cabinet of blade systems ( say 50 servers... ) Each server can handle a given number of instances simultaneously. ( say 100 ) In these 50 servers you have 2 or 3 servers dedicated to database transactions ( they are not part of the instance cluster ), and 2 or 3 more dedicated to coordination between all the servers ( they are part of the instance clusters but don't run actual instances, they are the ones that tell the other servers to spawn instances when necessary... or that move an instance from one server to another. ) Now for reliability purpose, every instance spawned is duplicated on two different blade. That way when the instance crash on one of the blade, the other blade can take the action on the fly while the crashed instance is reset on the relevant blade. I suspect that some of the lag spikes we encountered ( typical case : when somebody Disconnect ) were caused by the instance crashing and us being 'transfered' on the second blade that ran the instance we were in. As the blade used in secondary role in the above example is also handling other instances in the primary role when we get transfered ot it because of a crash we get lag, because we are the low priority job. Anyway, I'm not sure I'm clear...

	Yes my avatar is an Hermine eating a Greenland Lemming for brunch.
	IP Logged

Deathdefy Waterworks Kobold Offline I Love Drama! Posts: 169 Joined: Apr 29^th, 2011	Re: Hotfix was for lag apparently Reply #2 - Nov 23^rd, 2011 at 4:21am	Print Post
	Got to say; the weird lag when getting the chests in the first parts of the shroud seems to be completely gone for me. Whatever the mechanics are, maybe it worked.


	IP Logged

arkohighstar Ex Member	Re: Hotfix was for lag apparently Reply #3 - Nov 23^rd, 2011 at 9:59am	Print Post
	Flav wrote on Nov 23^rd, 2011 at 3:52am: I think he over simplified the back end architecture. I'm sure there's more than one physical server running each world. Things as I see it you have a cabinet of blade systems ( say 50 servers... ) Each server can handle a given number of instances simultaneously. ( say 100 ) In these 50 servers you have 2 or 3 servers dedicated to database transactions ( they are not part of the instance cluster ), and 2 or 3 more dedicated to coordination between all the servers ( they are part of the instance clusters but don't run actual instances, they are the ones that tell the other servers to spawn instances when necessary... or that move an instance from one server to another. ) Now for reliability purpose, every instance spawned is duplicated on two different blade. That way when the instance crash on one of the blade, the other blade can take the action on the fly while the crashed instance is reset on the relevant blade. I suspect that some of the lag spikes we encountered ( typical case : when somebody Disconnect ) were caused by the instance crashing and us being 'transfered' on the second blade that ran the instance we were in. As the blade used in secondary role in the above example is also handling other instances in the primary role when we get transfered ot it because of a crash we get lag, because we are the low priority job. Anyway, I'm not sure I'm clear... no that would describe it fairly well I think. Whatever the mechanics, it appears that based on his comment that the switching to the back up instance is overloading the server, in other words the failover is enough to keep us from crashing out altogether but the box is not strong enough to stay performant with the extra load. The real question is whether the whole server is crashing thus dumping all instances at once or are individual instances being dumped out.


	IP Logged

Balthazar Puppy Farmer Offline Time is of the essence! Posts: 1594 Location: Argo Joined: Jan 30^th, 2011 Gender:	Re: Hotfix was for lag apparently Reply #4 - Nov 23^rd, 2011 at 10:57am	Print Post
	Flav wrote on Nov 23^rd, 2011 at 3:52am: I think he over simplified the back end architecture. I'm sure there's more than one physical server running each world. Things as I see it you have a cabinet of blade systems ( say 50 servers... ) Each server can handle a given number of instances simultaneously. ( say 100 ) In these 50 servers you have 2 or 3 servers dedicated to database transactions ( they are not part of the instance cluster ), and 2 or 3 more dedicated to coordination between all the servers ( they are part of the instance clusters but don't run actual instances, they are the ones that tell the other servers to spawn instances when necessary... or that move an instance from one server to another. ) Now for reliability purpose, every instance spawned is duplicated on two different blade. That way when the instance crash on one of the blade, the other blade can take the action on the fly while the crashed instance is reset on the relevant blade. I suspect that some of the lag spikes we encountered ( typical case : when somebody Disconnect ) were caused by the instance crashing and us being 'transfered' on the second blade that ran the instance we were in. As the blade used in secondary role in the above example is also handling other instances in the primary role when we get transfered ot it because of a crash we get lag, because we are the low priority job. Anyway, I'm not sure I'm clear... With some of the new virtualization tech can work you can actually aggregate multiple blades into 1 virtual server, so it could be 1 virtual megga server but using the resources from multiple physical blades. This works both ways, scale up or down.

	Any fool can criticize, condemn and complain and most fools do.
	IP Logged

Flav Vault Frog Offline One Frog to Rule them All! Posts: 9984 Location: Land of the Frogs Joined: Aug 29^th, 2010 Gender:	Re: Hotfix was for lag apparently Reply #5 - Nov 23^rd, 2011 at 11:50am	Print Post
	Quote: The real question is whether the whole server is crashing thus dumping all instances at once or are individual instances being dumped out. I suspect it's just the instance ( as I suspect that a given instance is a thread spawned on a given server either as a 'general open' [ Market, Harbor, Houses ] instance or a quest instance. ) that crash. And the Lag spike is generated by the handover to the other server, not by the load... The lag generated by the load is always there and should be constant ( as long as your instance don't crash ). Balthazar wrote on Nov 23^rd, 2011 at 10:57am: With some of the new virtualization tech can work you can actually aggregate multiple blades into 1 virtual server, so it could be 1 virtual megga server but using the resources from multiple physical blades. This works both ways, scale up or down. I didn't want to make things too complex... Several virtual servers over several [ different several ] physical blades all in a cluster. I don't think they went the mega server way with several virtual beasts. Too much capital invested in that mega server. It's cheaper to buy blade chassis and put the number of blades needed... you can then add/remove them dynamically when the load change. Thus maximizing your revenue and minimizing your costs. The other point being that the former European Servers were hosted at an ATT Datacenter in Netherland... I don't expect CM to go there from UK every week to manage the server hardware. So something that can be soft configured sounds more appropriate. You rent the hardware, when you need to add more you ask for more to be added, when you need less you release some of the hardware to the lender.
	« Last Edit: Nov 23^rd, 2011 at 11:53am by Flav »
	Yes my avatar is an Hermine eating a Greenland Lemming for brunch.
	IP Logged

Balthazar Puppy Farmer Offline Time is of the essence! Posts: 1594 Location: Argo Joined: Jan 30^th, 2011 Gender:	Re: Hotfix was for lag apparently Reply #6 - Nov 23^rd, 2011 at 1:59pm	Print Post
	Flav wrote on Nov 23^rd, 2011 at 11:50am: I suspect it's just the instance ( as I suspect that a given instance is a thread spawned on a given server either as a 'general open' [ Market, Harbor, Houses ] instance or a quest instance. ) that crash. And the Lag spike is generated by the handover to the other server, not by the load... The lag generated by the load is always there and should be constant ( as long as your instance don't crash ). I didn't want to make things too complex... Several virtual servers over several [ different several ] physical blades all in a cluster. I don't think they went the mega server way with several virtual beasts. Too much capital invested in that mega server. It's cheaper to buy blade chassis and put the number of blades needed... you can then add/remove them dynamically when the load change. Thus maximizing your revenue and minimizing your costs. The other point being that the former European Servers were hosted at an ATT Datacenter in Netherland... I don't expect CM to go there from UK every week to manage the server hardware. So something that can be soft configured sounds more appropriate. You rent the hardware, when you need to add more you ask for more to be added, when you need less you release some of the hardware to the lender. You would be surprised at how cheap some of the new blade chassis are. I know the cisco UCS is very affordable(relatively). I dont find it outlandish that they could have full farms of blade servers.

	Any fool can criticize, condemn and complain and most fools do.
	IP Logged

Page Index Toggle

Pages: 1

‹ Previous Topic | Next Topic ›

« Board Index ‹ Board Top

Top

Vault » Powered by YaBB 2.6.11!
YaBB Forum Software © 2000-2024. All Rights Reserved.

Valid XHTML

Valid CSS

Powered by Perl

Source Forge