In this blogpost I want to share some experiences and practical tips when using DFS-R to sync local VDisk stores between multiple Provisioning Servers. Lets begin with a quick intro :
If you are streaming your VDisks from the Provisioning Servers local storage, you often want to replicate the stores to other Provisioning servers to provide HA for the VDisks and target connections. I’m a big fan of running VDisks from local storage because :
– No CIFS layer between the Provisioning Servers and the VDisks, increasing performance and eliminating bottlenecks
– No CIFS single point of failure
– No expensive clustered file system needed to provide HA for the VDisks
Caching on the device local harddrive or in the device RAM is the best option when using local storage for your VDisk store, this way you can easily load balance and failover between Provisioning Servers. Of course there are some down sides when placing the VDisks on local storage :
– Need to sync VDisk stores between Provisioning Servers, resulting in higher network utilization during sync
– Double storage space needed
– Not an ideal solution when using a lot of private mode VDisks (VDisk is continuous in use and cannot sync). Luckily we now have the Personal VDisk option in XenDesktop so IMHO private VDisks aren’t really necessary anymore in a SBC or VDI deployment.
Because one size doesn’t fit all, you can always mix storage types for storing VDisks depending on your needs, but for Standard mode images combined with caching on the device hard drive or device RAM using the local storage of the Provisioning Server is a good option.
Since Provisioning Server version 6 Citrix added functionality regarding versioning, you can now also easily see if the Provisioning servers are in sync with each other, but you have to configure the replication mechanism yourself. I have worked with a lot of different replication solutions to replicate the VDisks between provisioning servers, from manual copy to scripts using robocopy and rsync running both scheduled and manual. Lately I use DFS-R more and more to get the job done. Because DFS-R provides a 2-way (full mesh) replication mechanism, it’s a great way to keep your VDisk folders in sync, but there are some caveats to deal with when using DFS-R. Below I will give you some practical tips and a scenario you can run into when using DFS-R :
Last Writer wins
This is one of the most important things to deal with, DFS-R uses the last writer wins mechanism to decide which file overrules others. It’s a fairly easy mechanism based on time stamps : whoever changes the file last wins the battle and will synchronize to the other members, it will overwrite existing (outdated) files!
If you hold this mechanism against the mechanism how Provisioning Server works you will quickly run into the following trap :
Imagine you’re environment looks like the image below.
Step 1:
Because you want to update the image, you connect to the Provisioning console on PVS01 and create a new VDisk version, this will create a maintenance (.avhd) file on PVS01. Because this file is initially very small it will quickly replicate to the other Provisioning servers.
Step 2:
You spin up the maintenance VM, at this point you don’t know from which Provisioning server the maintenance VM will boot (decided based on load balancing rules), so let’s say it boots from PVS03.
You make changes to the maintenance image and shut it down.
Now the fun part is going to start! Based on the changes you made and the size of the .avhd file, it can take some time to replicate the updated file to the other Provisioning Servers.
Step 3:
In the meantime, still connected to PVS01, you promote the VDisk to test or production.
When you promote the VDisk, the SOAP service will mount the VDisk and make changes to it for KMS activation etc.
Step 4:
You boot a test or production VM from the new version, and you don’t see you’re changes, further more they are lost!
What happened? Well you ran into the last writer wins mechanism trap of DFS-R :
The promote takes place on the Provisioning Server which you are connecting to, so in the example this is PVS01. PVS01 doesn’t have the updated .avhd from PVS03 yet, so you promoted the empty .avhd file created when you clicked on new version.
Because the promote action updates the time stamp of the .avhd file, it will replicate this file to the other Provisioning server (again quick because it’s empty) overwriting the one with your updates.
Here are 2 options how you can work around this behaviour :
Option 1 :
After you make changes wait till the replication is finished (watch the replication tab in the Provisioning console) promote the version when every Provisioning Server is in sync.
Option 2 :
If you can’t wait connect the console to the Provisioning Server where the update took place, promote the new version there, so you are sure you promote the right .avhd file
Below I will give some other practical tips when using DFS-R to replicate your VDisk stores.
1. Ensure you have always enough free space left for staging files and make your staging quotas big enough to replicate the whole VDisk (1,5 time the VDisk size for example)
2. Create multiple VDisk stores for your VDisks, this allows you to create multiple DFS-R replication folders, replication works better with multiple smaller folders then a very large one
3. Watch the event viewers for DFS-R related messages, DFS-R logs very informative events to the event log, keep an eye on high watermark events and other events related to replication issues
4. Check the DFS-R backlog to see what’s happening in the background and to check that there are no files stuck in the queue, you can use the dfsrdiag tool to watch the backlog, for example :
dfsrdiag backlog /receivingmember:PVS03 /rfname:PVS_Store_01 /rgname:PVS_Store_01 /sendingmember:PVS01
5. Exclude lok files from being replicated, they should not be the same on every Provisioning Server
6. Plan big DFS-R replica traffic during off-peak hours, when DFS-R is replicating booting up your targets will be slower, you can also limit the bandwidth used for DFS-R replica traffic
7. Before you start check your Active Directory scheme and domain functional level, if you want to use DFS-R your Active Directory scheme must be up-to-date and support the DFS-R replication objects. Also note that only DFS-R replication is necessary, no domain name spaces are needed.
Conclusion
I can be very short here, my conclusion is that DFS-R can be a very nice and convenient way to keep your VDisk stores in sync, but you must understand how DFS-R replica works and how it behaves when combined with Provisioning Server. Hopefully this blog post gave you a better understanding when using DFS-R in combination with Provisioning Server and keep above points in mind when you consider using DFS-R as the replication mechanism for your VDisk stores.
Please note that the information in this blog is provided as is without warranty of any kind.
Excellent article. I been using MelioFS by Sanbolic for some time with PVS, however as you noted shared storage and a 3rd party tool can get very expensive and been looking to utilize local storage to remove the CIFS layer from PVS and vDisks. Thanks for the detail info
Is it possible to replicate the vdisk if vdisks are in use by some target devices.
or we can only replicate the vdisks if they are not in use by the target devices
Hi sa,
As long as the vdisks are in standard mode (read only) they can be replicated, even when there are active target connections to the vdisk.
Bram
Sounds very complicated and error prone. Also, from my experience with DFS-R, sooner or later DFS-R will blow up in your face. It simply is not one of the best technologies from Microsoft. While DFS-R works well for SYSVOL replication, its actual real-world use cases for replicating real data seem to be extremely limited. So … why bother using DFS-R at all.
From my experience with my customers, the following options are much more stable and reliable (because the don’t rely on unnecessary technology):
1. If the customer already has a decently sized FAS (30×0 or higher) with CIFS already enabled, just put the vDisk store on the CIFS on the FAS and everybody is happy. (This is definitely the preferred option, but obviously only works for NetApp customers)
2. Just manually copy the .vhd/.avhd and .pvp files manually. It literally takes ONE line of documentation in the operational procedures and it is FAST, RELIABLE and ALWAYS works.
Lastly, I hate to be picky, but your argument against CIFS being an additional technology layer doesn’t really work when your proposed solution is actually just that – to add an additional technology layer.
Cheers,
Christoph Wegener
Hi Christoph,
Thanks for your valuable feedback.
I agree that DFS-R needs ongoing attention when used for VDisk replication, that’s why I wrote the article to highlight this attention points.
Out of my experience when this attention points are kept in mind (on ongoing bases) DFS-R can be a valid replication mechanism, of course there are other (easier) replication techniques but they are out of the scope of this blog post and of course manual copy of the VDisk works but is something I won’t advice as a solution to customers, I would at least script\automate this process in some way.
I also agree with you when you have a reliable CIFS source like a NetApp or other tuned for PVS CIFS share with enough storage and bandwidth this can be a preferred option against local storage, but this is something that needs to be in place and a HA CIFS solution can quickly become an expensive option.
At last by calling CIFS an additional layer, I meant in the streaming process in opposite to local storage, not in the replication process.
Cheers,
Bram
Hey Bram
Nice post on PVS + DFS. Actually we have been doing this for several years on our different deployments. And we have NEVER had any problems with the replication. I’ll add some of my experiences..
Since the built in versioning in PVS was added, with maint computer booting from the same folder, just another .avhd file, we experienced major performance loss if DFS-R is active. I know we could just schedule the replication to off-hours, but normally we need to boot test machine asap after promoting version to test. What i do is just disable DFS-R service on one PVS host while working on new version and enable it when finished. It will normally replicate pretty fast because of the nature of .vhd+.avhd gives you one big plus several smaller files. It takes some manual work, but its really no big deal.
\\morten
Thanks Morten!
I think your blog is amazing. You write about very interesting things. Thanks for all the tips and information
Bram, great article and some good points in the other comments also. We have done various combinations of PVS storage solution for customers and in the end it comes down to customer preference, budget, existing storage capability (ie do they have NetApp or equivalent CIFS filer?) and so on, there is no 100% right or wrong answer.
Magnar Johnsen wrote a handy little DFS config script for those not quite sure how to set it up to suit PVS: http://virtexperience.com/2012/09/19/citrix-pvs-dfs-replication-configuration-script/
Regards
Dan
Pingback: Sync PVS vDisk Stores
Pingback: “Management Interface: VHD Error” with PVS and DFSR » Ingmar Verheij - The dutch IT guy
Great article. DFS-R is something that I will look into now as I am currently copying the vdisk manually. One problem I have copying the file manually and was curious if using something like DFS will help is when I copy the files to the manually sometimes in the PVS console the replication status will say that the file are not synced. Is that tied to the timestamp in any way or possible something else going on?
Thanks again for the great article.
Thanks! Yes that is correct, it is important that the timestamps match.
DFS and also sync tools like robocopy will copy the timestamps for you, manually copy will create new timestamps and causes out of sync issues.
Bram
Thanks for the quick reply..I will look into robocopy in the short term but work on the DFS option. Thanks again..awesome site.
It seems that it is no longer the recommendation of Citrix to filter out the .lok files.