Manage Repository Storage Space
Overview
As the repository is used, it can collect a great deal of obsolete or unused object. As all of this content is live and indexed, it can have an impact on performance and can push the database to its maximum allocation. There are a number of features provided to aid in managing and eliminating this unwanted clutter.
As with any relational database applications, it is very difficult to measure the space used by an application concept as it is dispatched in many tables (as opposed to a file system). This is actually virtually impossible by design in the case MM where there are 4 levels of complexity:
-
The meta-meta-model which is a simple ERA / object model implemented in the repository database by a only a few tables.
-
itself used to implement the meta-models (profiles) for various technologies such as relational databases, data lakes, data integration, business intelligence, etc.
-
themselves used to implement the model instances such as the finance staging database, the data warehouse, etc.
-
themselves dispatched under multiple versions with incremental harvesting based on multi-model resuse between versions. For example the first versions will have all schemas of the data warehouse, but the successive versions will only import the schema that changed, reusing the others from previous versions. Deleting the first version will not delete the space reused for schemas of newer versions.
Usage
Steps
-
Ensure that all model Imports are performed with incremental harvesting (where available)
Delete unused test and sandbox type content
Oftentimes special projects come and go but the content is left lying around the repository and thus cluttering up the database. It is important to periodically analyze what is in the repository and why.
This is a manual process of browsing through the Repository tree and identifying what should be deleted.
Delete unused configuration versions
Oftentimes, a "frozen" copy of a configuration is created for "backup" or "historical analysis". In some cases this is important. However, as more and more of these collect, they may have an impact on space and performance. Most of these "historical" versions are of no value to keep around. Also, they consume resources, such as disk space, index size, performance of search, etc. Finally, some configuration management processes create a new version (migrate) each time the complete metadata is harvested, so that one may ensure that the metadata is good before publishing. Obviously, these older versions should be deleted.
To determine the need in this area, it is a good idea to periodically run the Repository Statistics operation. This way, if you see that there is a large ratio between the number of versions of configurations versus total number of configurations (not counting versions), then you would likely benefit from some pruning here.
In this case we have only two configurations and yet eight different versions. It may be time to prune.
You may use the manual process of browsing through the Repository tree and identifying what older configuration versions should be deleted.
In addition, you may periodically run the Repository Configuration Statistics operation on the current configuration for more details about what older models may need deletion.
In this case we can see that there are 7 versions of the Demo Enterprise Architecture configuration, but that all of them have the same versions of the models and are thus superfluous and only the Development and Published versions are required at this time.
Delete unused model versions (built-in operation)
As new versions of models and physical data models are harvested, they begin to collect. Most of these "historical" versions are of no value to keep around. Also, they consume resources, such as disk space, index size, performance of search, etc.
MetaKarta provides a tool to manage and eliminate these older versions. There is a script operation named Delete unused versions. When run, if a version of a model is not in any current configuration version, and it is more than one hour old, it will be deleted.
As Delete unused versions is an operation, you may define a schedule for it and that way it will clean up the older unused versions of model periodically and automatically.
Ensure that all model Imports are performed with incremental harvesting (where available)
Not only does incremental harvesting save a great deal of processing time when harvesting, it also consumes orders of magnitude less space. Only those portions of the entire harvested model that have changed are re-imported and written as new versions to the repository database. The rest of the models are simply reused in the new harvested version.
This consideration is especially on daily harvesting of large databases, file systems and BI servers.
You will have to go to each of these larger models and ensure that incremental harvesting has not been disabled manually using the "-cache.clear" option in the Miscellaneous bridge parameter. If the option is there, remove it and SAVE the Import Setup. See importing models as well as using "-cache.clear" manually.
Delete operation logs
As debug logs of harvests in the past can build up, they can take up significant space. Although they are not indexed and should not affect performance by themselves, they can definitely push against the database allocation.
This consideration is especially important on daily harvesting of large databases, file systems and BI servers.
MetaKarta provides a tool to manage and eliminate these older logs. There is a script operation named Delete operation logs. When run, it removes logs older than a certain number of days and allows filters for failed vs. successful logs.
As Delete operation logs is an operation, you may define a schedule for it and that way it will clean up the older unused logs of models periodically and automatically.
Turn off the system wide Debug logging
As debug logs may be an order of magnitude greater in size than the normal execution logs, they can take up significant space. Although they are not indexed and should not affect performance by themselves, they can definitely push against the database allocation.
This consideration is especially important on daily harvesting of large databases, file systems and BI servers. Also, the setting is system wide, so once you are through testing or reporting a ticket (which requires debug-level logging) it is important to turn it back off.
You may turn the Debug logging on and off under MANAGE > System.
Run Database maintenance
Most databases do not perform a true delete and re-index when objects are deleted. Thus, there will be no real impact on the total size of the database without running database maintenance.
MetaKarta provides a tool to perform database maintenance. There is a script operation named Run database maintenance. This is a system management task which should be scheduled periodically.
If a large number of models and versions are deleted at once, it may be necessary to execution the Database maintenance operation many times.
Compress the database
Many databases do not release allocated space until explicitly told to. Please work with your repository database administrator to ensure that the database is tuned and pruned appropriately.
Please see the Database Administration section in the Deployment Guide for guidance with the embedded PostgreSQL repository database.