Last year, we worked with Bryn Mawr College and the other Seven Sister schools to build College Women, an online collection of shared images and primary source materials from these historic women’s institutions.
This online repository collects images and metadata from seven different systems, each with their own way of presenting the data. For the second phase of the project, we decided to use REPOX to make the data more consistent.
REPOX is an open source data aggregation framework built by Europeana. It’s a great tool for importing, transforming, and exposing data from a variety of sources. For a project like College Women, REPOX allows us to make our data consistent before importing it into the site, saving time harvesting in Drupal and making sure data is correct before it enters the content management system. We’ve documented our experience to help you get your REPOX project up and running online. These instructions are for anyone who has some comfort with the command line, and this process shouldn’t take more than thirty minutes (down from the five hours it took to do before creating this tutorial!).
Setting up a server
We used Amazon Web Services (AWS) to deploy REPOX, because the pricing is flexible and we could spin up a server quickly. If you haven’t used AWS before, it can feel a little daunting, so let’s start at the beginning.
Go to aws.amazon.com and create an account with all appropriate billing information. Once you have an account, tap on “Services” and select Elastic Computing (EC2). EC2 is a service that allows you to bring up and down virtual servers, so you pay for what you need when you need it.
From the EC2 dashboard, select “Instances” and then click “Launch Instance”. This will take you through a step-by-step process of setting up the virtual server that we need to run REPOX.
The first step in this process is to select an Amazon Machine Image (or AMI). This image is the foundation for your system, including operating system, pre-install software, and other utilities. After experimenting with a few, we chose the “Tomcat Powered by Bitnami (PV)” AMI available in the AMI Marketplace. This instance had most of the things we need (Apache Tomcat, Java, and MySQL) already configured. Once you’ve selected this AMI, choose your instance type—we suggest an m3.medium or equivalent to ensure you have enough memory to install and run REPOX.
NOTE: We had originally hoped to run REPOX on a t2.micro instance to take advantage of the free tier pricing, but we found out during installation that the server didn’t have enough memory to complete the installation, and generally ran too slow after we got everything up and running.
From here, you can typically skip the additional configuration options and go right into launching your instance. Set up a new key pair so you can securely log into the server—enter a name, click download, and then “Launch Instances”. Be sure to keep the private key that it downloads safe, you’ll need it later!
Now that your AWS instance is up and running, you’ll see the new instance running in your EC2 Management Console. Take note of a few things here: the Public DNS address (you’ll need this to login later) and the Public IP address (you’ll need this to test things out on your server). If you go to the public IP address in your browser, you should see the Tomcat “congratulations!” screen with a number of resources and links.
We’ll also need to get the default password for our instance, which we’ll use later on to log into our MySQL database. From our Instances dashboard, click “Actions”, then “Instance Settings” and “Get System Log”. This will bring up a terminal window with the log of our server spinning up for the first time. Scroll down to the bottom and write down the highlighted application password.
Installing REPOX’s Dependencies
You’re going to run all of your commands from the terminal, and the first thing you need to do is securely connect to your server:
ssh -i /path/to/privatekey.pem firstname.lastname@example.org
The command above connects you to the newly created server with your selected private key. The default user for any Bitnami AMI is typically “bitnami”, and you’ll use the public server address that you made note of earlier. Once connected, you’re ready to start installing REPOX.
REPOX requires a few things to run: Java 7 or later, Apache Tomcat and Maven, and a MySQL or PostGRES database. The Amazon Machine Image has several of these already installed—all you need to do is install Git, Maven, and a mail server (we’re using Gmail), and you can start the REPOX install process.
First, install Git, which you’ll need to pull down the project’s codebase. You can do this by running:
sudo apt-get update sudo apt-get install git
This will download, run, and install Git and all of the required dependencies. If you’re not able to find the package to install Git, you can also run
sudo apt-get updateto make sure you have the latest library to pull from. With Git installed, we can now close the repository from GitHub. In your current directory, clone the repo with:
git clone https://github.com/europeana/REPOX.git
This will download all the code you need to install REPOX on the server into a new folder. Before you get started on setting up and installing REPOX, you’ll need to install Apache Maven, a tool for managing and building Java-based projects.
sudo apt-get install maven
Maven can take a while to install and will prompt you for confirmations along the way. When done, double check that everything installed successfully:
Now you can start the install process for REPOX. First, you need a database for your tool to use. To use phpMyAdmin, you’ll need to connect securely to the server. In a new terminal window or command prompt, type:
ssh -N -L 8888:127.0.0.1:80 -i /path/to/privatekey.pem email@example.com
This will let you access phpMyAdmin locally by typing “http://127.0.0.1:8888/phpmyadmin” in your web browser. From here, you can log into phpMyAdmin using the username
rootand the password you saved earlier from the AWS console. You’ll need to create a new database— call yours “repoxdb” so it’s easy to remember.
Now, you can configure REPOX for installation. From your current directory, you need to navigate to the
configuration.propertiesfile and edit your settings:
cd REPOX/resources/src/main/resources sudo nano configuration.properties
This will open up your editor so you can change your configuration settings. You’ll want to edit the “Advanced Properties” at the bottom of this file, particularly adding in your own gmail account information for email and updating your database settings. Be sure to comment out lines 49–52 and uncomment lines 61–64 since you’re using MySQL, and add in your own username and password for MySQL here!
Once set, it’s finally time to get REPOX installed. Navigate back up to the root directory:
And then run the installation command with Maven to start the install process:
mvn clean install - Pproduction
The Maven installation can take some time—expect to let it do its thing for about 10 minutes. Fortunately, running this on a more powerful EC2 instance will make it go a bit faster and avoid running out of memory. Once finished, you’ll need to copy the .war file that Maven creates into our Tomcat webapps directory:
sudo cp gui/target/repox.war /opt/bitnami/apache-tomcat/webapps/
And, last but not least, restart your Tomcat server:
cd /opt/bitnami/apache-tomcat/bin sudo ./catalina.sh stop sudo ./catalina.sh start
Now that REPOX has installed successfully, you can use the public IP address to log in. If you go to ‘/repox’ you’ll get directed to your REPOX instance, where you can log in using the default administrative account (with the clever “admin” username and “admin” password combination).
Once logged in, I would suggest changing your default admin password and creating some user accounts for people on your team. Before you can do that, you’ll need to set up your mail server information, which can simply use a gmail account you already have running. In the top bar, click “Administration” and “User Management” to add a new user. Now you’re good to go!
Now that you’re up and running with REPOX, you can start importing, transforming, and publishing your data in powerful ways. Let us know if you found this tutorial helpful or have any suggestions on improvements. Hopefully this makes the process of spinning up an instance of REPOX a little easier for you and your organization!