In November we had a situation where our production environment had records unintentionally deleted. This required us to restore a backup copy of production into a new environment and retrieve the missing records from it.
Why didn’t we restore the environment backup directly over the production environment? There are many reasons which I’ve documented in the Forward Forever team blog. In short, if you’ve got any Power Apps canvas apps or Power Automate cloud flows in your environment, things can get seriously messed up if you restore the backup into the same environment. My recommendation is to avoid doing this in production if you have any workarounds at your disposal.
After we had manually copied & imported the data back, we left that restore environment in place for a while. In this case, “a while” actually meant 6 months. We were in no rush to free up the capacity, so I decided to wait and see if there were any further lessons to be learned from this incident.
What happens to the storage space of a restored environment that no one is using? You might expect it to remain roughly in the same size as the original backup. In our case, the restore environment grew to be over 2x the size of the original environment. Below is an illustration of the restore vs. production environment storage usage from Power Platform Admin Center reports:
Our production today is at around 7 GB total Dataverse storage consumed, whereas the production restore environment had ballooned to 17 GB. What was consuming all that space? The AsyncOperation table:
This is where all the Dataverse system jobs are stored. These jobs will keep running, even if no live users (nor outside integrations) touch the environment.
Looking at the number of rows in that table (via XrmToolBox plugin Fast Record Counter), I saw that while our production environment had 8.4k rows, the restore environment had 51k rows in that table.
Why are there more jobs in the dormant environment? This is because normally the completed system jobs are deleted by another scheduled job, known as bulk delete jobs. Only in this restore environment the jobs just kept piling up. I checked that the bulk delete jobs weren’t reporting any errors. However, the actual system jobs offered the explanation to the storage space growth:
Switching to the suspended system jobs view revealed that there were 3.5k system events stuck. New batches seemed to be generated on a daily basis. With titles like “Microsoft.Dynamics.CDD.AuthorizationCorePlugins.RoleAutoExpanderPlugin”, it wasn’t immediately obvious what these jobs were related with.
Upon inspecting the system jobs records, the column “message name” revealed that these are related to solution imports and updates. Yes, just because you stop using a Dataverse environment, that doesn’t mean Microsoft would stop from servicing it with the latest solution versions and new features.
Why did the jobs get suspended then? The answer is in what happens after restoring tje environment from a backup. It gets put into administration mode by default. The intention here is quite sensible, since you wouldn’t want any integrations from the newly restored environment to be talking with the outside world. This could cause issues when you’d have multiple Dataverse environments connecting to the same target systems, potentially causing duplicate data and messages to be created.
The challenge here is that in today’s Dataverse / Dynamics 365 environments there are first-party integrations that also rely on features that the admin mode by default disables. These will keep running as system jobs inside the environment, yet they can’t complete their tasks and are therefore put in the queue as suspended jobs.
In a small CRM style environment like we have, this caused 10 GB worth of additional data to get accumulated into the Dataverse tables within 6 months. While system jobs are now stored in the cheaper file capacity rather than the expensive database capacity, it’s still quite a lot of unwanted storage consumption from built-in features.
Obviously the administration mode is not designed to be a permanent state for any Power Apps or Dynamics 365 solution’s hosting environment. This does highlight the fact that it’s not possible to simply “freeze” a Dataverse environment and keep a snapshot of your data and configuration for a longer duration in the MS cloud. All live environments will get updates to system solutions sooner rather than later, thus altering the state of the database. While the business data in the Dataverse tables will be preserved as-is, the metadata and its surrounding maintenance processes will keep on living their lives.
The Common Data Service is often though as a relational data store that resembles the former XRM database. While there is backward compatibility in the sense that you can do everything with CDS that you could in XRM (Online), the real power of the cloud platform comes from going beyond those limits. Earlier I’ve talked about how The Real Common Data Emerges as we start to work with a variety of different data types and even reaching out into the Azure Data Lake as part of leveraging the Dynamics 365 first-party apps. This time I want to drill deeper into the specifics of images and files as new field types available in CDS.
Feature announcements from Microsoft
The concept of CDS heterogenous data storage was demonstrated back in Business Application Summit 2019 in June. As illustrated below, alongside the traditional SQL database for relational data, CDS now offers also the option for binary (file) data stored in Azure Blob Storage and log data in CosmosDB, as well as search indexes offered via Azure Search. All of these are are available under the common entity schema and business logic defined in CDS, without requiring the app makers to think about what data goes into which specific service and how. This is how the Power Platform provides a higher level of abstraction compared to code based app development on top of the raw Azure services.
And here’s the equivalent slide for file data type:
In December 2019 there have now been announcements made on the Power Apps blog about Introducing File and Image datatype, as well as the availability of the public preview of file attributes on Power Apps Canvas apps. Things are still in the process of rolling out and support for Model-driven apps hasn’t yet even been demonstrated anywhere, so this isn’t something you can jump into using right away.
Scenarios for files in CDS
Traditionally in the world of CRM projects we’ve always advised against putting files into the database. “Keep ’em in SharePoint” has been the standard answer, which still makes a lot of sense for any collaboration on content creation, document versioning and so on. The SharePoint document management integration in CDS offers an out-of-the-box experience that generates document locations linked to specific records in CDS and allows working with them through the Model-driven app UI. If you’re happy with auto-generated folders under a single document library on a single SharePoint site for all the records of a particular CDS entity (like accounts), there’s no need to look any further. In real world customer environments the OoB integration is often not sufficient, and I’m really glad that things are improving with the Microsoft Teams based document management integration offering a more practical security and data location model. (Note that currently the Teams integration can’t be enabled for pure CDS environments without Dynamics 365 apps.)
The problem that still remains is that both the direct SharePoint-CDS integration as well as the Teams-SharePoint-CDS combo don’t offer much business process context for the files. It’s more of a helpful tip like “hey, if you’re looking for documents related to this account/project/order/inspection, try searching from this folder”, rather than a very specific instruction about which particular file contains information on what step of a business process managed in CDS. You also can’t really verify whether a required document exists in the system before proceeding further in the process, since all you have is links to a folder which might or might not contain that file – or multiple copies and various different file types when what you’d really need is a single required PDF, like a signed agreement document.
With the new file and image datatypes, you can actually define a specific field to store a specific type of document or image. This will let you know exactly what the business purpose of a particular piece of binary data is, which means you can develop app functionality like user interface and business logic around it. It’s no longer just 0…N files linked from another system (like SharePoint), it becomes an integral part of your business process. The demo that Ryan Jones did at Ignite about an inspections app is a good example of what the “strongly typed” image and file data could be in practice:
Having rich metadata about what a particular document represents in the real world is great, but what’s an even bigger benefit is the security model around it. As anyone with an XRM background knows, the ways in which you can configure the security model in CDS is very advanced, offering granular control of who can create, read, update and delete data. You have security roles, business unit hierarchies, position hierarchies, owner teams, access teams, sharing, even field level security. Any security logic that you apply on the entity that’s hosting the file or image field will also guard access to the binary content. If you’ve ever tried to sync the security information between a Dynamics 365 based CRM system storing customer records and the SharePoint environment storing the related documents, you’ll know how difficult and error prone this attempt is. The security concepts of those two systems are inherently different and even Microsoft is unlikely ever offer anything close to 1:1 integration (Teams is about as close as you can get.) For access control of sensitive documents and images, these new datatypes in CDS are therefore a very attractive option.
Attachments (annotations) vs. file/image datatype
Let’s not forget that it has always been possible to store binary data in CDS, even in the on-premises days. All your tracked Dynamics 365 emails will have automatically uploaded their every attachment into the Note entity. Additionally the users have been able to add notes with attachments on the Timeline of any entity where the attachment feature has been enabled.
As part of the new storage capacity model launched in April 2019, Microsoft will have already migrated all of the attachments previously stored in the SQL database to Azure Blob Storage behind the scenes for any Online environment. However, this doesn’t make the attachments feature any more modern and you should seriously consider not using it in the future (where possible). While there is a somewhat better security story with the data all being behind CDS APIs, you won’t find any customization options here to align the data in Notes entity with your business requirements nor the desired application UI. It’s a fixed way of representing file data alongside the customizable relational business data model, inherited from the Dynamics CRM days rather than a feature designed for the Power Platform era.
In the meantime, with the lack of better support for image handling, many of us have surely explored the capabilities of building a Power Apps Canvas App that could perform what the above Ignite inspection demo app does. Dropping a camera control on the Canvas App is so easy, yet storing the captured image into CDS alongside the other inspection data has been next to impossible. Yes, attachments as a separate control has been available for Power Apps makers for quite some time, but patching the image data from somewhere else into a new CDS attachment record is the tricky part. Complex record references like the Regarding field on the Note entity in CDS have long been a stumbling block for Canvas Apps, and as of today you still can’t write data to that field. Jumping through hoops made of Flows and Custom Connectors is hardly the kind of seamless experience you’d expect from a low-code application platform when working with camera images, so there definitely has been a big demand for the image datatype to come and replace the clunky attachment feature.
Back when CDS was just the storage place for structured data that was accessed via the metadata driven UI of Model-driven apps for CRM scenarios, there weren’t that many places where visually pleasing stuff like images could have been used. The entity image with its glorious 144×144 resolution has been cool for demo data, but how many customers have actually ended up populating logos, profile pics, product images or other visuals in there? With the rise of citizen developers armed with Canvas apps that offer pixel perfect UI development, the situation is now quite different and there’s an expectation to be able to work with full-size images as well as showing thumbnails for visualizing the business records.
Things to keep an eye on
As mentioned earlier, we haven’t yet been able to truly validate in real life Power Apps what functionality files and images support. I’m expecting there will be further chapters in the story of how heterogenous storage in Common Data Service evolves over time, so the first release for Canvas Apps and later Model-driven apps may not yet be feature complete. How the data will work with PCF controls/components and other features of Power Platform (automation, offline, search, AI…) is going to be a big factor in deciding whether storing files and images into the dedicated CDS datatypes is the right call for your app. Of course you’ll also need to examine the options from the other Microsoft clouds: Azure and Office.
If you’re doing custom code development and expect to deal with a large amount of binary data in your app, doing the math on storage cost between the configuration friendly CDS and raw Azure Blob Storage is probably going to be an item on your solution design agenda. Just like with relational data, CDS is always going to be priced as a premium service compared to things like Azure SQL, because it provides you so many layers of additional features you’d otherwise have to build and maintain. Storage is only one part of the equation, but of course you’ll need to ensure the business case is valid when consuming Power Platform storage capacity with its associated services.
If the applications you build are aiming to support the collaboration of information workers over unstructured data like Word documents with co-authoring and several versions, then that data clearly doesn’t belong into CDS. Use MS Teams as the security mechanism where possible and allow the users to work with the documents through SharePoint, offline synced OneDrive folders, Office applications on any device etc. If there is an end product that comes from this collaborative process and needs to be carried along in a structured business process, then that file could well be stored into a CDS file attribute.
It will be interesting to see how Microsoft will align these file & image attribute features with the existing attachments feature. Having a predefined number of fields per entity where you can drop a single file is obviously quite a different experience than an open “folder” that could accept as many files as needed. Although on the schema level, also the Notes (annotation) collection only accepts just one attachment per each note and the rest is just UI. Whether we’ll receive a true customizable Notes feature from MS with metadata support and a modern control to complement the standard Timeline visualization on the Model-driven app as well as the attachments control on the Canvas side is something that remains to be seen. I’m also expecting to see some community contributed PCF controls in the near future around the new datatypes.
When Microsoft announced one year ago that XRM would become CDS v2.0 (officially Common Data Service for Apps), there wasn’t yet any big system redesign implemented to make this a physical reality. Today we are much further down that road where CDS truly becomes a Service that has less and less to do with the familiar XRM databases that we’ve previously been working with. In this blog post I’ll explore the three data related dimensions that give us an indication of where CDS is heading as a part of the Microsoft Power Platform.
CDS is now Dataverse!
While reading this article, you can translate the term “Common Data Service” to now refer to its new name, Microsoft Dataverse. See this post for comparison between CDS vs. Dataverse.
Dynamics 365 Storage Model Changes
As a part of the April 2019 release train, MS is changing the way how data storage is managed for both Dynamics 365 and PowerApps customers. It hasn’t been an official feature bullet on the release notes document, but that doesn’t mean its significance would be any less than what the shiny apps demonstrated in the April 2nd Virtual Launch event have.
A new version of licensing guides for Dynamics 365 and also for PowerApps and Flow (for the first time ever!) was released in April. This outlines the commercial impact of the new model to customers, which is probably what most of us will have first paid attention to. Yeah, whenever the pricing mechanism of a widely used MS cloud service changes, it will be a big deal. What makes it even trickier is that MS considers storage as a “subscription add-on” for which they don’t publicly disclose any per GB list prices. I’m not entirely sure this model is beneficial for their ambitions of turning Power Platform into an actual foundation for building third party and customer specific apps, but I guess the shadow of the old CRM and ERP world still looms above this world when it comes to licensing and pricing practices.
Let’s forget licensing for a moment and focus on the technical changes for Dynamics 365 online environments. All of the existing data that used to be stored in the Azure SQL relational database will in the future be divided into three specific storage types: database, file, log. This should have no immediate impact to customers, as the migration will be taken care of by MS. Their promise is that nothing should change in the way how users and developers work with data, since the APIs that govern access to this data will remain unaffected.
File data will be in Azure blob storage, as this is the most efficient way to handle miscellaneous documents, images and other “stuff” that may end up inside a typical Dynamics 365 system via features like email tracking that carries over the attachments. Why would you ever store this in a relational SQL database to begin with? Well, the simple reason is that the original on-prem architecture of XRM had no other secure place to put these items, so it was all lumped up there. Now when CDS is a native cloud service, there are much more options available.
Log data will be in Cosmos DB. This will probably offer a more suitable architecture for managing things like plugin trace logs, audit data and other items of similar nature. What should be noted is that Microsoft’s plans don’t just stop at this IT admin activities level. In a recent podcast by MVP Mark Smith, we heard the General Manager of Power Platform, Charles Lamanna, describe this storage type to be designed as the future place for other types of observational data, too. Charles referred to things like IoT device sensor data, which should give you an idea of how this again is data that is A) relevant to many CRM use cases and B) in no way optimal to be stored inside that relational XRM database.
One significant and very welcome change that is introduced as a part of this new model is that there will no longer be any license cost tied to the number of instances you have in the cloud. Previously you had to buy add-on licenses for acquiring production and non-production (sandbox) instances for developing, testing, training and in general managing your complex Dynamics 365 online environment. Once the new subscription terms kick in, you’ll have the ability to create as many instances as you like, provided that you have sufficient database capacity available. A major driver behind this change is surely the PowerApps side, in which the licensing terms already granted any user with PowerApps P2 license to create 2 CDS environments for their applications. (For more details, see my presentation on Demystifying Dynamics 365 & Power Platform licensing.)
In the short term, this storage model change should not result in much functional changes for the Dynamics 365 customers. Depending on when your current subscription renewal date is, the new terms will be applied either at that point in time or the renewal after that (if you choose to hold on to the old model for one more subscription period). Any new customer will likely be leveraging the new pricing model starting from April 2019.
It’s important to understand that the actual data storage technology change and the commercial terms that are applied are not tied to one another. Migration of your Dynamics 365 data to the new database/file/log model will probably take place much sooner than what you’ll see in your subscription fees. Refer to the admin documentation on Common Data Service storage capacity for details on how you’ll be able to analyze and manage your storage consumption in this new model.
Diving Into The Data Lake
When looked at purely from the storage license model changes for Dynamics 365 customers, the story would end here, with the three storage types. However, the bigger picture of how data is used as a part of the Customer Engagement systems that cover various digital touchpoints is much broader. Or should I say “bigger” as in Big Data? As much as I dislike the casual use of tech marketing hype terms like Big Data and Artificial Intelligence, there’s no escaping the fact that the familiar world of CRM systems founded on SQL databases is being disrupted by what machine learning models and big data systems can offer today.