Script Ingest¶
The pipeline¶
De Hocank data is op verschillende MPI servers gemount, e.g., ssh.mpi.nl
scp danrhe@ssh.mpi.nl:/home/menwin/tmp/hocank-data.tgz .
tar xvfz /home/menwin/tmp/hocank-data.tgz
Je kunt de docker container opstarten met de Hocank data gemount
docker run -p 80:80 -p 8443:8443 -v ../../hocank-data/:/lat -t -i flat
In de container bevindt je je direct in de /app/flat directory. Voer daar de volgende commando’s uit
mkdir src
cd src
ln -s /lat/Hocank .
cd ..
ln -s /lat/imdi-to-skip.txt .
./do-0-convert.sh # create empty directory structure resembling the origin; create cmdi meta data files in directory Metadata directory
./do-1-fox.sh # create FOXML files for each specific file
./do-2-import.sh # java script to ingest data based on FOXML file
./do-3-config-cmd-gsearch.sh
./do-4-index.sh
Nu kun je naar je browser en in FLAT grasduinen:
De default login is:
- admin/admin
Structure of CMDI¶
In the CMDI file, you define the structure of your project. Thus, there is one CMDI per project/measure. When ingesting a project, for all files plus the metadata files FOXML objects are created and ingested.
The structure of each CMDI is as follows
Header: here you define the ID of your metadata file and maybe other important things.
<MdCreator> I don"t know </MdCreator>
<MdCreationDate>2007-11-26</MdCreationDate>
<MdSelfLink> The part following the colon will become the PID (slash and score will become underscore </MdSelfLink>
<MdProfile>clarin.eu:cr1:p_1407745712035</MdProfile>
Resources: Here you specify the files associated with the meta file. Make sure to specify both resource and metadata file.
<ResourceProxy id="d333e522"> //this id needs to be unique and will be used later on
<ResourceType mimetype="audio/x-mpeg3">Resource</ResourceType> //important to render the file in the correct way
<ResourceRef lat:localURI="/path/to/File.mp3">hdl:use-unique-PID</ResourceRef>
<ResourceProxy id="landingpage"> //don|t know if the specific id name is necessary to qualify meta data files
<ResourceType>LandingPage</ResourceType> //same is true for type and ref
<ResourceRef>hdl:1839/00-0000-0000-0016-7DE4-4</ResourceRef>
We can customize a CMDI file and generate a FOXML file from it. Therefore, we need to link the users workspace in the /app/flat/src directory and then run the scripts as indicated above.
In order to see the data on the fedora server, we need to change a parameter in our native fedora config file (/var/www/fedora/server/config/fedora.fcfg):
<param name="ENFORCE-MODE" value="permit-all-requests"/>
And, of course, restart the fedora server:
/var/www/fedora/tomcat/bin/tomcat-fedora.sh start