Upgrade from Archivematica 1.15.x to 1.16.0¶
On this page:
- Clean up completed transfers watched directory
- Create a backup
- Upgrade Ubuntu package install
- Upgrade Rocky Linux/Red Hat package install
- Upgrade in indexless mode
- Upgrade with output capturing disabled
- Update search indices
- Review the processing configuration
- Migrate from MySQL 5.x to 8.x
Note
While it is possible to upgrade a GitHub-based source install using ansible, these instructions do not cover that scenario.
Clean up completed transfers watched directory¶
Note
Ignore this section if you upgrading from Archivematica 1.11 or newer.
Upgrading from Archivematica 1.10.x or older to Archivematica 1.16.0 can result in a number of completed transfers appearing as failed in the Archivematica dashboard, as well as corresponding failure notification emails being sent. These are not actual failures, but are unintentional side effects of changes made in Archivematica 1.11 to the workflow and to how metadata files are stored and copied into the SIP.
To prevent these failures from occuring during an upgrade from Archivematica 1.10 or earlier:
Confirm that all transfers and ingests are complete.
Check that there are no transfers or SIPs that are still being processed or awaiting decisions in the Transfer and Ingest tabs. If there are, finish processing the transfers/ingests before proceeding.
Delete all contents of the completedTransfers watched directory.
sudo rm -rf /var/archivematica/sharedDirectory/watchedDirectories/SIPCreation/completedTransfers/*
Perform the upgrade as described below.
Create a backup¶
Before starting any upgrade procedure on a production system, we strongly recommend backing up your system. If you are using a virtual machine, take a snapshot of it before making any changes. Alternatively, back up the file systems being used by your system. Exact procedures for updating will depend on your local installation. At a minimum you should make backups of:
- The Storage Service SQLite (or MySQL) database
- The dashboard MySQL database
This is a simple example of backing up these two databases:
sudo cp /var/archivematica/storage-service/storage.db ~/storage_db_backup.db
mysqldump -u root -p MCP > ~/am_backup.sql
If you do not have a password set for the root user in MySQL, you can take out the ‘-p’ portion of that command. If there is a problem during the upgrade process, you can restore your MySQL database from this backup and try the upgrade again.
If you’re upgrading from Archivematica 1.8 or lower to the 1.9 version or higher, the Elasticsearch version support changed from 1.x to 6.x and it’s also recommended to create a backup of your Elasticsearch data, especially if you don’t have access to the AIP storage locations in the local filesystem.
You can follow these steps in order to create a backup of Elasticsearch:
# Remove and recreate the folder that stores the backup
sudo rm -rf /var/lib/elasticsearch/backup-repo/
sudo mkdir -p /var/lib/elasticsearch/backup-repo/
sudo chown elasticsearch:elasticsearch /var/lib/elasticsearch/backup-repo/
# Allow elasticsearch to write files to the backup
echo 'path.repo: ["/var/lib/elasticsearch/backup-repo"]' |sudo tee -a /etc/elasticsearch/elasticsearch.yml
# Restart ElasticSearch and wait for it to start
sudo service elasticsearch restart
sleep 60s
# Configure the ES backup
curl -XPUT "localhost:9200/_snapshot/backup-repo" -H 'Content-Type: application/json' -d \
'{
"type": "fs",
"settings": {
"location": "./",
"compress": true
}
}'
# Take the actual backup, and copy it to a safe place
curl -X PUT "localhost:9200/_snapshot/backup-repo/am_indexes_backup?wait_for_completion=true"
cp /var/lib/elasticsearch/backup-repo elasticsearch-backup -rf
For more info, refer to the ElasticSearch 6.8 docs.
Upgrade on Ubuntu packages¶
Update the operating system.
sudo apt-get update && sudo apt-get upgrade
Update package sources.
echo 'deb [arch=amd64] http://packages.archivematica.org/1.16.x/ubuntu jammy main' >> /etc/apt/sources.list echo 'deb [arch=amd64] http://packages.archivematica.org/1.16.x/ubuntu-externals jammy main' >> /etc/apt/sources.list
Optionally you can remove the lines referencing packages.archivematica.org/1.15.x from /etc/apt/sources.list.
Update the Storage Service.
sudo apt-get update sudo apt-get install archivematica-storage-service
Update Archivematica. During the update process you may be asked about updating configuration files. Choose to accept the maintainers versions. You will also be asked about updating the database - say ‘ok’ to each of those steps. If you have set a password for the root MySQL database user, enter it when prompted.
sudo apt-get install archivematica-common sudo apt-get install archivematica-dashboard sudo apt-get install archivematica-mcp-server sudo apt-get install archivematica-mcp-client sudo apt-get install archivematica
Restart services.
sudo service archivematica-storage-service restart sudo service gearman-job-server restart sudo service archivematica-mcp-server restart sudo service archivematica-mcp-client restart sudo service archivematica-dashboard restart sudo service nginx restart
Depending on your browser settings, you may need to clear your browser cache to make the dashboard pages load properly. For example in Firefox or Chrome you should be able to clear the cache with control-shift-R or command-shift-F5.
Upgrade on Rocky Linux/Red Hat packages¶
Upgrade the repositories for 1.16:
sudo sed -i 's/1.15.x/1.16.x/g' /etc/yum.repos.d/archivematica*
Remove the current installed version of ghostscript:
sudo rpm -e --nodeps ghostscript ghostscript-x11 \ ghostscript-core ghostscript-fonts
Upgrade Archivematica packages:
sudo yum update
Apply the Archivematica database migrations:
sudo -u archivematica bash -c " \ set -a -e -x source /etc/default/archivematica-dashboard || \ source /etc/sysconfig/archivematica-dashboard \ || (echo 'Environment file not found'; exit 1) cd /usr/share/archivematica/dashboard /usr/share/archivematica/virtualenvs/archivematica/bin/python manage.py migrate --noinput ";
Apply the Storage Service database migrations:
Warning
In Archivematica 1.13 or newer, the new default database backend is MySQL. Please follow our migration guide to move your data to a MySQL database before these migrations are applied.
If you want to continue using SQLite, please edit the environment configuration found in
/etc/sysconfig/archivematica-storage-service
. Comment outSS_DB_URL
and indicate the path of the SQLite database withSS_DB_NAME
, e.g.:SS_DB_NAME=/var/archivematica/storage-service/storage.db
.sudo -u archivematica bash -c " \ set -a -e -x source /etc/default/archivematica-storage-service || \ source /etc/sysconfig/archivematica-storage-service \ || (echo 'Environment file not found'; exit 1) cd /usr/lib/archivematica/storage-service /usr/share/archivematica/virtualenvs/archivematica-storage-service/bin/python manage.py migrate ";
Restart the Archivematica related services, and continue using the system:
sudo systemctl restart archivematica-storage-service sudo systemctl restart archivematica-dashboard sudo systemctl restart archivematica-mcp-client sudo systemctl restart archivematica-mcp-server
Depending on your browser settings, you may need to clear your browser cache to make the dashboard pages load properly. For example in Firefox or Chrome you should be able to clear the cache with control-shift-R or command-shift-F5.
Upgrade on Vagrant / Ansible¶
This upgrade method will work with Vagrant machines, but also with cloud based virtual machines, or physical servers.
Connect to your Vagrant machine or server
vagrant ssh # Or ssh <your user>@<host>
Install Ansible
sudo pip install ansible==2.9.10 jmespath jinja2==3.0.3
Checkout the deployment repo:
git clone https://github.com/artefactual/deploy-pub.git
Go into the appropiate playbook folder, and install the needed roles
Ubuntu 22.04 (Jammy):
cd deploy-pub/playbooks/archivematica-jammy ansible-galaxy install -f -p roles/ -r requirements.yml
Rocky Linux 9:
cd deploy-pub/playbooks/archivematica-rocky9 ansible-galaxy install -f -p roles/ -r requirements.yml
All the following steps should be run from the respective playbook folder for your operating system.
Verify that the vars-singlenode.yml has the appropiate contents for Elasticsearch and Archivematica, or update it with your own
Create a hosts file.
echo 'am-local ansible_connection=local' > hosts
Upgrade Archivematica running
ansible-playbook -i hosts singlenode.yml --tags=elasticsearch,archivematica-src
Upgrade in indexless mode¶
As of Archivematica 1.7, Archivematica can be run in indexless mode; that is,
without Elasticsearch. Installing Archivematica without Elasticsearch, or with
limited Elasticsearch functionality, means reduced consumption of compute
resources and lower operational complexity. By setting the
archivematica_src_search_enabled
configuration attribute, administrators can
define how many things Elasticsearch is indexing, if any. This can impact
searching across several different dashboard pages.
Upgrade your existing Archivematica pipeline following the instructions above.
Modify the relevant systemd EnvironmentFile files by adding lines that set the relevant environment variables to
false
.If you are using Ubuntu, run the following commands.
sudo sh -c 'echo "ARCHIVEMATICA_DASHBOARD_DASHBOARD_SEARCH_ENABLED=false" >> /etc/default/archivematica-dashboard' sudo sh -c 'echo "ARCHIVEMATICA_MCPSERVER_MCPSERVER_SEARCH_ENABLED=false" >> /etc/default/archivematica-mcp-server' sudo sh -c 'echo "ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_SEARCH_ENABLED=false" >> /etc/default/archivematica-mcp-client'
If you are using Rocky Linux, run the following commands.
sudo sh -c 'echo "ARCHIVEMATICA_DASHBOARD_DASHBOARD_SEARCH_ENABLED=false" >> /etc/sysconfig/archivematica-dashboard' sudo sh -c 'echo "ARCHIVEMATICA_MCPSERVER_MCPSERVER_SEARCH_ENABLED=false" >> /etc/sysconfig/archivematica-mcp-server' sudo sh -c 'echo "ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_SEARCH_ENABLED=false" >> /etc/sysconfig/archivematica-mcp-client'
Restart services.
If you are using Ubuntu, run the following commands.
sudo service archivematica-dashboard restart sudo service archivematica-mcp-client restart sudo service archivematica-mcp-server restart
If you are using Rocky Linux, run the following commands.
sudo -u root systemctl restart archivematica-dashboard sudo -u root systemctl restart archivematica-mcp-client sudo -u root systemctl restart archivematica-mcp-server
If you had previously installed and started the Elasticsearch service, you can turn it off now.
sudo -u root systemctl stop elasticsearch sudo -u root systemctl disable elasticsearch
Upgrade with output capturing disabled¶
As of Archivematica 1.7.1, output capturing can be disabled at upgrade or at
any other time. This means the stdout and stderr from preservation tasks are
not captured, which can result in a performane improvement. See the
Task output capturing configuration <task-output-capturing-admin> page for
more details. In order to disable output capturing, set the
ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_CAPTURE_CLIENT_SCRIPT_OUTPUT
environment
variable to false
and restart the MCP Client process(es). Consult the
installation instructions for your deployment method for more details on how to
set environment variables and restart Archivematica processes.
Update search indices¶
Note
Ignore this section if you are planning to run Archivematica without search indices.
Archivematica releases may introduce changes that require updating the search indices to function properly, e.g. Archivematica v1.12.0 introduced new fields to the search indices and made some changes to text field types. Please keep an eye on our release notes before you start the upgrade.
The update can be accomplished one of two ways. Preferably, you can reindex the documents which is usually faster because the same documents that you already have indexed will be re-ingested. We would love to know if this is not working for you, but when that’s the case, it is possible to recreate the indices which will take much longer to complete because it accesses the original data, e.g. your AIPs.
Reindex the documents¶
In Elasticsearch, it is possible to add new fields to search indices but it is not possible to update existing ones. The recommended strategy is to create new indices with our desired mapping and reindex our documents. This is based on the Reindex API.
It is a multi-step process that we have automated with a script: es-reindex.sh. Please follow the link and read the instructions carefully.
Warning
Before you continue, we recommend backing up your Elasticsearch data. Please read the official docs for instructions.
Note
We may implement this script as a Django command in the future for better usability. For the time being, please download the script and tweak as needed.
Recreate the indices¶
This method will allow you to delete and rebuild the existing Elasticsearch indices so that all the Backlog and Archival Storage column fields are fully populated, including for transfers and AIPs ingested prior to the upgrade to Archivematica 1.16.0. Run the commands described in Rebuild the indexes to fully delete and rebuild the indices.
Execution example:
sudo -u archivematica bash -c " \
set -a -e -x
source /etc/default/archivematica-dashboard || \
source /etc/sysconfig/archivematica-dashboard \
|| (echo 'Environment file not found'; exit 1)
cd /usr/share/archivematica/dashboard
/usr/share/archivematica/virtualenvs/archivematica/bin/python \
manage.py rebuild_transfer_backlog --from-storage-service --no-prompt
";
sudo -u archivematica bash -c " \
set -a -e -x
source /etc/default/archivematica-dashboard || \
source /etc/sysconfig/archivematica-dashboard \
|| (echo 'Environment file not found'; exit 1)
cd /usr/share/archivematica/dashboard
/usr/share/archivematica/virtualenvs/archivematica/bin/python \
manage.py rebuild_aip_index_from_storage_service --delete-all
";
Note
Please note, the use of encrypted or remote Transfer Backlog and AIP Store locations may require use of the option to rebuild indices from the Storage Service API rather than from the filesystem. At this time, it is not possible to rebuild the indices for all types of remote locations.
Note
Please note, the execution of this command may take a long time for big AIP and Transfer Backlog storage locations, especially if the packages are stored compressed or encrypted, or you are using a third party service. If that is the case, you may want to reindex the Elasticsearch documents instead.
Review the processing configuration¶
After any Archivematica upgrade, it is recommended to perform a sanity check on your processing configurations. Look for new decision points where you want to establish a default, like the new “Scan for viruses” introduced in Archivematica 1.13.
The default
and automated
bundled configurations can be reset to the
Archivematica defaults.
Migrate from MySQL 5.x to 8.x¶
It is recommended the MySQL databases for Archivematica and Storage Service use
the MySQL 8 utf8mb4
character set and its default collation
utf8mb4_0900_ai_ci
(or utf8mb4_general_ci
in MariaDB).
If you migrate your databases from MySQL 5.x you can check the character set and encoding of their tables with:
SELECT
t.table_schema, t.table_name, c.character_set_name, t.table_collation
FROM
information_schema.tables t,
information_schema.collation_character_set_applicability c
WHERE
c.collation_name = t.table_collation
AND t.table_type = 'BASE TABLE'
AND (t.table_schema = 'MCP' OR t.table_schema = 'SS');
If they use the utf8mb3
character set and collation you should update them
to avoid potential migration conflicts like this:
Running migrations:
Applying admin.0003_logentry_add_action_flag_choices... OK
Applying auth.0009_alter_user_last_name_max_length... OK
Applying auth.0010_alter_group_name_max_length... OK
Applying auth.0011_update_proxy_permissions... OK
Applying auth.0012_alter_user_first_name_max_length... OK
Applying locations.0031_rclone_space...Traceback (most recent call last):
File "/pyenv/data/versions/3.9.18/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/pyenv/data/versions/3.9.18/lib/python3.9/site-packages/django/db/backends/mysql/base.py", line 73, in execute
return self.cursor.execute(query, args)
File "/pyenv/data/versions/3.9.18/lib/python3.9/site-packages/MySQLdb/cursors.py", line 179, in execute
res = self._query(mogrified_query)
File "/pyenv/data/versions/3.9.18/lib/python3.9/site-packages/MySQLdb/cursors.py", line 330, in _query
db.query(q)
File "/pyenv/data/versions/3.9.18/lib/python3.9/site-packages/MySQLdb/connections.py", line 255, in query
_mysql.connection.query(self, query)
MySQLdb.OperationalError: (3780, "Referencing column 'space_id' and referenced column 'uuid' in foreign key constraint 'locations_rclone_space_id_adb7fd1d_fk_locations_space_uuid' are incompatible.")
django.db.utils.OperationalError: (3780, "Referencing column 'space_id' and referenced column 'uuid' in foreign key constraint 'locations_rclone_space_id_adb7fd1d_fk_locations_space_uuid' are incompatible.")
The following script can be used as a reference to update the character set of the databases and their tables.
#!/usr/bin/env bash
set -o errexit # abort on nonzero exitstatus
set -o nounset # abort on unbound variable
set -o pipefail # do not hide errors within pipes
# Array of database names
DATABASES=(
MCP
SS
)
# Collation and CHARSET
CHARSET="utf8mb4"
COLLATION="utf8mb4_0900_ai_ci"
# MySQL authentication (optional, default no auth)
MYSQL_USE_AUTH=False
MYSQL_USER=root
MYSQL_PASSWORD="THE_PASSWORD"
# Function to execute a query
execute_query() {
local query="$1"
local db_name="$2"
local user_arg=""
if [ "$MYSQL_USE_AUTH" = "True" ]; then
user_arg="-u$MYSQL_USER"
export MYSQL_PWD="$MYSQL_PASSWORD"
fi
mysql -N -B $user_arg -e "$query" "$db_name"
}
# Function to fix database charset and collation
fix_database_charset() {
local query="ALTER DATABASE ${DB_NAME} CHARACTER SET $CHARSET COLLATE $COLLATION;"
echo "Fixing database charset and collation"
execute_query "$query" "$DB_NAME"
echo "Fixed database charset and collation"
}
# Function to fix tables charset and collation
fix_tables_charset() {
local query="SELECT CONCAT('ALTER TABLE \`', table_name, '\` CHARACTER SET $CHARSET COLLATE $COLLATION;') \
FROM information_schema.TABLES AS T, information_schema.\`COLLATION_CHARACTER_SET_APPLICABILITY\` AS C \
WHERE C.collation_name = T.table_collation \
AND T.table_schema = '$DB_NAME' \
AND (C.CHARACTER_SET_NAME != '$CHARSET' OR C.COLLATION_NAME != '$COLLATION');"
local alter_table_queries=$(execute_query "$query" "$DB_NAME")
alter_table_queries_no_foreign_key_checks=$(echo -e "SET FOREIGN_KEY_CHECKS=0;\n$alter_table_queries\nSET FOREIGN_KEY_CHECKS=1;")
# echo "$alter_table_queries_no_foreign_key_checks"
echo "Fixing tables charset and collation"
execute_query "$alter_table_queries_no_foreign_key_checks" "$DB_NAME"
echo "Fixed tables charset and collation"
}
# Function to fix column collation for varchar columns
fix_varchar_columns_collation() {
local query="SELECT CONCAT('ALTER TABLE \`', table_name, '\` MODIFY \`', column_name, '\` ', DATA_TYPE, \
'(', CHARACTER_MAXIMUM_LENGTH, ') CHARACTER SET $CHARSET COLLATE $COLLATION', \
(CASE WHEN IS_NULLABLE = 'NO' THEN ' NOT NULL' ELSE '' END), ';') \
FROM information_schema.COLUMNS WHERE TABLE_SCHEMA = '$DB_NAME' AND DATA_TYPE = 'varchar' AND \
( CHARACTER_SET_NAME != '$CHARSET' OR COLLATION_NAME != '$COLLATION');"
local alter_table_queries=$(execute_query "$query" "$DB_NAME")
alter_table_queries_no_foreign_key_checks=$(echo -e "SET FOREIGN_KEY_CHECKS=0;\n$alter_table_queries\nSET FOREIGN_KEY_CHECKS=1;")
# echo "$alter_table_queries_no_foreign_key_checks"
echo "Fixing column collation for varchar columns"
execute_query "$alter_table_queries_no_foreign_key_checks" "$DB_NAME"
echo "Fixed column collation for varchar columns"
}
# Function to fix column collation for non-varchar columns
fix_non_varchar_columns_collation() {
local query="SELECT CONCAT('ALTER TABLE \`', table_name, '\` MODIFY \`', column_name, '\` ', DATA_TYPE, ' \
CHARACTER SET $CHARSET COLLATE $COLLATION', (CASE WHEN IS_NULLABLE = 'NO' THEN ' NOT NULL' ELSE '' END), ';') \
FROM information_schema.COLUMNS \
WHERE TABLE_SCHEMA = '$DB_NAME' \
AND DATA_TYPE != 'varchar' \
AND (CHARACTER_SET_NAME != '$CHARSET' OR COLLATION_NAME != '$COLLATION');"
local alter_table_queries=$(execute_query "$query" "$DB_NAME")
alter_table_queries_no_foreign_key_checks=$(echo -e "SET FOREIGN_KEY_CHECKS=0;\n$alter_table_queries\nSET FOREIGN_KEY_CHECKS=1;")
# echo "$alter_table_queries_no_foreign_key_checks"
echo "Fixing column collation for non-varchar columns"
execute_query "$alter_table_queries_no_foreign_key_checks" "$DB_NAME"
echo "Fixed column collation for non-varchar columns"
}
# Loop through each database in the array
for DB_NAME in "${DATABASES[@]}"; do
echo "Processing database: $DB_NAME"
fix_database_charset
fix_tables_charset
fix_varchar_columns_collation
fix_non_varchar_columns_collation
echo "Migration completed for $DB_NAME"
done
# Unset the MYSQL_PWD environment variable after executing the queries
unset MYSQL_PWD