Using Python to access the EIDR registry with EIDR REST APIs

Hitesh Pau
7 min readAug 14, 2020

What is EIDR

EIDR is a universal unique identifier system for movie and television assets. From top level franchises, titles, edits, and collections, to series, seasons, episodes, and clips — EIDR provides global unique identifiers for the entire range of singular and serial audiovisual object types that are relevant to both commercial and non-commercial works

You can learn more at https://eidr.org/

EIDR Rest APIs

The EIDR system provides various services using a REST-based interface in combination with HTTP 1.1 (see RFC 2616).

Here is the link to the EIDR Rest API document

Note: Public services do not necessarily mean open access. Ingesting or registering data into EIDR is controlled while reading data from EIDR is generally not restricted.

Using Python to access EIDR Registry using EIDR REST API

Pre-requisites

  1. You have Python installed. For this tutorial, we are using Python version 3.7.4. If not, you can download Python from https://www.python.org/ Please refer to Python documentation to install Python on your machine.
  2. You have a Text editor installed on your machine. There are many python friendly editor, which helps writing Python code easier.

For Mac, you can use SublimeText, Atom, TextWrangler

For Windows, you can use Notepad, Notepad++, SublimeText, Atom

3. You can also install Jupyter, which is a web-based interactive Python development environment. You can download from https://jupyter.org/

Assumption

  1. You know basic Python programming.
  2. You are a member of EIDR and have credentials to access EIDR Registry

Let’s get started…

First thing first, authentication and authorization to log into EIDR. Referring to section 2.1.2 of the EIDR REST API document (Page 8), we need to establish a connection with EIDR registry. To do so, let’s install the requests python packages -

pip install requests

Base64 and haslib packages are included in the Python installation.

These packages along with your credentials are required to establish a connection. Once the packages are installed successfully you can import them in your code

import requests, base64, hashlib

Setup your EIDR credentials

UserID = '10.5238/xxxxxxxx'. # enter your EIDR User Id
Pwd = '************' # enter your EIDR password
PartyID = '10.5237/xxxxxxxxx' # enter your EIDR party ID
url = 'https://sandbox1.eidr.org:443/EIDR/' # EIDR Registry URL

You should note that testing/training should be done in the sandbox registry (https://sandbox1.eidr.org:443/EIDR) and that only tested and verified code should be used in production (https://registry1.eidr.org:443/EIDR).

Encrypt the credentials as documented in section 2.1.2 of the EIDR REST API document (Page 8)

PasswordShadow = base64.b64encode(hashlib.md5(pwd.encode('utf-8')).digest()).decode('utf8')
auth_str = '%s:%s:%s' % (UserID, PartyID, PasswordShadow)
headers = {'Authorization' : 'Eidr {}'.format(auth_str), 'Accept': 'text/xml', 'Content-Type': 'text/xml'}

Now we write our first function to get EIDR data for an EIDR ID. We pass the EIDR Id as the argument to the function. The function uses the requests package to get the EIDR information.

Refer to the Resolution Service 2.3.3 on page 21 of the EIDR REST Api document for more details on the request.

def getEIDRData(id):
req = url + 'object/' + id + '?type=Full&followAlias=true'
resp = requests.get(req, headers=headers)
#print(resp.content)
return resp.content

Once we write the function, we can call the getEIDRData function to pass an EIDR ID and get the EIDR response.

eidr_resp = getEIDRData('10.5240/0EF3-54F9-2642-0B49-6829-R')
eidr_resp

The response is in XML format, but it is not easily readable. We can use another python package to format it.

pip install beautifulsoup4

Once installed successfully, import the package and call the prettify method to format the XML.

from bs4 import BeautifulSoupsoup = BeautifulSoup(eidr_resp, 'xml')
print(soup.prettify())

With BeautifulSoup, you can read data for specific elements of the XML

print('Type = {}'.format(soup.FullMetadata.BaseObjectData.ReferentType.contents[0]))
print('Title = {}'.format(soup.FullMetadata.BaseObjectData.ResourceName.contents[0]))
print('Release Year = {}'.format(soup.FullMetadata.BaseObjectData.ReleaseDate.contents[0]))
print('RunTime = {}'.format(soup.FullMetadata.BaseObjectData.ApproximateLength.contents[0]))
print('EIDR ID = {}'.format(soup.FullMetadata.BaseObjectData.ID.contents[0]))

Below is the complete code -

import requests, base64, hashlib
from bs4 import BeautifulSoup
UserID = '10.5238/xxxxxxxx' # enter your EIDR User Id
Pwd = '************' # enter your EIDR password
PartyID = '10.5237/xxxxxxxxx' # enter your EIDR party ID
url = 'https://sandbox1.eidr.org:443/EIDR/' # EIDR Registry URL
#Encrypt the credentials
PasswordShadow = base64.b64encode(hashlib.md5(Pwd.encode('utf-8')).digest()).decode('utf8')
auth_str = '%s:%s:%s' % (UserID, PartyID, PasswordShadow)
headers = {'Authorization' : 'Eidr {}'.format(auth_str), 'Accept': 'text/xml', 'Content-Type': 'text/xml'}
#getEIDRData Function
def getEIDRData(id):
req = url + 'object/' + id + '?type=Full&followAlias=true'
resp = requests.get(req, headers=headers)
#print(resp.content)
return resp.content
#get the EIDR ID details by calling the getEIDRData function
eidr_resp = getEIDRData('10.5240/0EF3-54F9-2642-0B49-6829-R')
#print the raw response from EIDR
print(eidr_resp)
#format the EIDR response
soup = BeautifulSoup(eidr_resp, 'xml')
print(soup.prettify())
# Extract the specific EIDR values you need from the response using the BeautifulSoup package
print('Type = {}'.format(soup.FullMetadata.BaseObjectData.ReferentType.contents[0]))
print('Title = {}'.format(soup.FullMetadata.BaseObjectData.ResourceName.contents[0]))
print('Release Year = {}'.format(soup.FullMetadata.BaseObjectData.ReleaseDate.contents[0]))
print('RunTime = {}'.format(soup.FullMetadata.BaseObjectData.ApproximateLength.contents[0]))
print('EIDR ID = {}'.format(soup.FullMetadata.BaseObjectData.ID.contents[0]))

This completes the basics of using Python to access the EIDR registry with EIDR REST APIs.

Bonus Material

The above code is useful for searching only one EIDR ID, but a good practical use of the above code is to search for multiple EIDR IDs.

Let’s extend the above code to read a file with multiple EIDR IDs and pull the data for each from the EIDR Registry.

Create a new text file with a list of EIDR Ids. Open a text editor and paste the below list of EIDR IDs or you can create your own list.

10.5240/2B8B-96D7-3142-0F17-C4F1-K
10.5240/2B95-4875-AFEC-2468-BC0C-1
10.5240/2B9B-79F1-F1E6-2FE6-2295-3
10.5240/2B9E-40B4-F563-F31A-1C97-U
10.5240/2B9F-EBC3-0F7F-7112-E388-Z
10.5240/2BA3-6BFA-95D0-7FEF-1DC5-F
10.5240/2BA3-F378-3E87-F338-C560-E
10.5240/2BA9-45B1-2D8D-2A67-C531-Z
10.5240/2BA9-5306-B286-E868-93B8-6
10.5240/35C9-4085-4FE9-8ACA-F464-W
10.5240/35DA-5397-4068-053C-2B23-2
10.5240/35E4-0485-8BA4-FBFE-BC47-3
10.5240/35E5-49E7-7365-B0B1-3848-D
10.5240/35FF-B72E-F96D-7B50-9180-F
10.5240/3613-72AB-2916-044E-8E1C-A
10.5240/3619-02F2-54DF-52A9-B545-Y
10.5240/361A-D1B7-5127-BD94-2196-N
10.5240/3620-6102-F372-9FA5-F26B-A
10.5240/3620-F84D-C5C3-ABA8-9BCE-E
10.5240/3621-D010-2660-5F34-883C-Y

Save the file in the same folder as the python code file. Name the file — input_eidrs.txt

Now install a new and most popular Python package called Pandas

pip install pandas

Import Pandas in the code

import pandas as pd

Read the input file and split each eidr id from the file into a python list object called eidrs

inputFile = 'input_eidrs.txt'with open(inputFile) as f:
eidrs = f.read().splitlines()

The below code does the following -

  1. Creates a list object to store the EIDR data results
  2. Loop thru each EIDR ID and pass the id to the getEIDRData function
  3. Extract the required EIDR data values and store in the output_list object
output_list = []
for e in eidrs:
if len(e) == 34: # check the length of EIDR, if it is not = 34, then it is an invalid EIDR ID
eidr_resp = getEIDRData(e)
soup = BeautifulSoup(eidr_resp, 'xml')

Title = soup.FullMetadata.BaseObjectData.ResourceName.contents[0]
Type = soup.FullMetadata.BaseObjectData.ReferentType.contents[0]
ReleaseYear = soup.FullMetadata.BaseObjectData.ReleaseDate.contents[0]
RunTime = soup.FullMetadata.BaseObjectData.ApproximateLength.contents[0]

#row_list = [Title, Type, ReleaseYear, RunTime]
output_list.append([e, Title, Type, ReleaseYear, RunTime])

We can use the output_list to create a pandas dataframe and display the top 5 records of the dataframe

df_output = pd.DataFrame(output_list, columns =['ID', 'Title', 'Type', 'ReleaseYear', 'Runtime'])
df_output.head()

You can export the dataframe output to an excel file

df_output.to_excel('eidr_data.xlsx')

The complete bonus code is here -

import requests, base64, hashlib
from bs4 import BeautifulSoup
import pandas as pd
UserID = '10.5238/xxxxxxxx' # enter your EIDR User Id
Pwd = '************' # enter your EIDR password
PartyID = '10.5237/xxxxxxxxx' # enter your EIDR party ID
url = 'https://sandbox1.eidr.org:443/EIDR/' # EIDR Registry URL
#Encrypt the credentials
PasswordShadow = base64.b64encode(hashlib.md5(Pwd.encode('utf-8')).digest()).decode('utf8')
auth_str = '%s:%s:%s' % (UserID, PartyID, PasswordShadow)
headers = {'Authorization' : 'Eidr {}'.format(auth_str), 'Accept': 'text/xml', 'Content-Type': 'text/xml'}
#getEIDRData Function
def getEIDRData(id):
req = url + 'object/' + id + '?type=Full&followAlias=true'
resp = requests.get(req, headers=headers)
#print(resp.content)
return resp.content
inputFile = 'input_eidrs.txt'with open(inputFile) as f:
eidrs = f.read().splitlines()
#list object to store the EIDR response values
output_list = []
#Loop thru each EIDR ID
for e in eidrs:
# check the length of EIDR, if it is not = 34, then it is an invalid EIDR ID
if len(e) == 34:
#call getEIDRData for each EIDR Id
eidr_resp = getEIDRData(e)
#format the xml output
soup = BeautifulSoup(eidr_resp, 'xml')
#Extract the required values from the EIDR XML
Title = soup.FullMetadata.BaseObjectData.ResourceName.contents[0]
Type = soup.FullMetadata.BaseObjectData.ReferentType.contents[0]
ReleaseYear = soup.FullMetadata.BaseObjectData.ReleaseDate.contents[0]
RunTime = soup.FullMetadata.BaseObjectData.ApproximateLength.contents[0]
#Save the extracted value in a list object
output_list.append([e, Title, Type, ReleaseYear, RunTime])
#use the output list object to create a new dataframe
df_output = pd.DataFrame(output_list, columns =['ID', 'Title', 'Type', 'ReleaseYear', 'Runtime'])
#Display the top 5 records of the dataframe
df_output.head()
#Export the dataframe to excel
df_output.to_excel('eidr_data.xlsx')

This completes how to get EIDR data for multiple EIDRs and export to excel.

Also check out:

how to get status for multiple EIDR tokens.

how to get EIDR data from Alternate Ids

--

--