Using Python to get EIDRs mapped to Alternate Ids

Hitesh Pau
7 min readOct 9, 2020

Before you start reading this article, please read my previous article Using Python to access the EIDR registry with EIDR REST APIs to understand to initial setup.

Alternate ID in EIDR

The Alternate ID field is an important attribute in the EIDR metadata schema. It plays an important role in ensuring the interoperability of EIDR IDs with other existing ID systems. The field consists of a type and a value. For example, an Alternate ID could have a type of IMDB and a value of tt6723592. Proprietary IDs are supported as well, with an added attribute giving the domain within which the ID is valid. See below -

The Alternate ID field is used by metadata vendors to link EIDR records to vendor IDs that reference external sources of commercial metadata for the asset. Studios or other content producers may cross-reference to internal IDs used for other distribution or tracking purposes. EIDR serves as a useful cross-referencing tool for access to a wide variety of external sources of data about each registered asset.

This article shows how to get the EIDR Ids using the Alternate Ids as the input.

Pre-requisites

  1. You have Python installed. For this tutorial, we are using Python version 3.7.4. If not, you can download Python from https://www.python.org/ Please refer to Python documentation to install Python on your machine.
  2. You have a Text editor installed on your machine. There are many python friendly editor, which helps writing Python code easier.

For Mac, you can use SublimeText, Atom, TextWrangler

For Windows, you can use Notepad, Notepad++, SublimeText, Atom

3. You can also install Jupyter, which is a web-based interactive Python development environment. You can download from https://jupyter.org/

Assumption

  1. You know basic Python programming.
  2. You are a member of EIDR and have credentials to access EIDR Registry

EIDR Rest APIs

The EIDR system provides various services using a REST based interface in combination with HTTP 1.1 (see RFC 2616).

Here is the link to the EIDR Rest API document

Note: Public services do not necessarily mean open access. Ingesting or registering data into EIDR is controlled, while reading data from EIDR is generally not restricted.

EIDR Resolution Service API

Refer to 2.3.1 Resolution service (page 14) in the EIDR Rest API document for the request call and parameters.

Let’s get started…

Refer to Using Python to access the EIDR registry with EIDR REST APIs to do the following -

  1. Install the required python packages
  2. Authenticate and authorize to access EIDR Rest APIs
import requests, base64, hashlib
from bs4 import BeautifulSoup
import pandas as pd

UserID = '10.5238/xxxxxxxx' # enter your EIDR User Id
Pwd = '************' # enter your EIDR password
PartyID = '10.5237/xxxxxxxxx' # enter your EIDR party ID
url = 'https://sandbox1.eidr.org:443/EIDR/' # EIDR Registry URL

#Encrypt the credentials
PasswordShadow = base64.b64encode(hashlib.md5(Pwd.encode('utf-8')).digest()).decode('utf8')
auth_str = '%s:%s:%s' % (UserID, PartyID, PasswordShadow)headers = {'Authorization' : 'Eidr {}'.format(auth_str), 'Accept': 'text/xml', 'Content-Type': 'text/xml'}

Now we write a function to get the EIDRs using Alternate Ids. Refer to 2.3.1 Resolution service (page 14) document for the request call and parameters.

def getAltIdData(altid):
req = url + 'object/?altid=' + altid
resp = requests.get(req, headers=headers)
return resp.content

Let’s test the function by passing an alternate id and seeing the result (or response)

#pass the alternate id and get the EIDR response
eidr_altid_resp = getAltIdData('tt6723592')
#display the response xml
soup = BeautifulSoup(eidr_altid_resp, 'xml')
print(soup.prettify())

Here is the response XML

<?xml version="1.0" encoding="utf-8"?>
<FullMetadata xmlns="http://www.eidr.org/schema" xmlns:md="http://www.movielabs.com/schema/md/v2.8/md" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<BaseObjectData>
<ID>
10.5240/1654-3307-7C31-D453-B702-X
</ID>
<StructuralType>
Abstraction
</StructuralType>
<Mode>
AudioVisual
</Mode>
<ReferentType>
Movie
</ReferentType>
<ResourceName lang="en">
Tenet
</ResourceName>
<OriginalLanguage mode="Audio">
en
</OriginalLanguage>
<AssociatedOrg role="producer">
<md:DisplayName>
Syncopy
</md:DisplayName>
</AssociatedOrg>
<AssociatedOrg idType="EIDRPartyID" organizationID="10.5237/06DC-BE7D" role="producer">
<md:DisplayName>
Warner Bros Pictures
</md:DisplayName>
</AssociatedOrg>
<ReleaseDate>
2020
</ReleaseDate>
<CountryOfOrigin>
US
</CountryOfOrigin>
<Status>
valid
</Status>
<ApproximateLength>
PT1H50M
</ApproximateLength>
<AlternateID domain="commonsense.org/nid" xsi:type="Proprietary">
6193520
</AlternateID>
<AlternateID domain="commonsense.org/uuid" xsi:type="Proprietary">
48d84b91-cca1-4a25-8df8-ba26b2fd4c5a
</AlternateID>
<AlternateID relation="IsSameAs" xsi:type="IMDB">
tt6723592
</AlternateID>
<AlternateID domain="themoviedb.org/movie" relation="IsSameAs" xsi:type="Proprietary">
577922
</AlternateID>
<AlternateID domain="moviebuff.com" xsi:type="Proprietary">
2fb5d022-4036-4fe5-a3ce-ea9032643ebd
</AlternateID>
<Administrators>
<Registrant>
10.5237/8EED-9CB4
</Registrant>
</Administrators>
<Credits>
<Director>
<md:DisplayName>
Christopher Nolan
</md:DisplayName>
</Director>
<Actor>
<md:DisplayName>
John David Washington
</md:DisplayName>
</Actor>
<Actor>
<md:DisplayName>
Robert Pattinson
</md:DisplayName>
</Actor>
<Actor>
<md:DisplayName>
Kenneth Branagh
</md:DisplayName>
</Actor>
<Actor>
<md:DisplayName>
Michael Caine
</md:DisplayName>
</Actor>
</Credits>
</BaseObjectData>
</FullMetadata>

The response XML shows that the Alternated Id we passed, resulted in a successfully EIDR ID (10.5240/1654–3307–7C31-D453-B702-X) . We can extract the EIDR Id and its details using the BeautifulSoup package.

print('Title = {}'.format(soup.FullMetadata.BaseObjectData.ResourceName.contents[0]))
print('Type = {}'.format(soup.FullMetadata.BaseObjectData.ReferentType.contents[0]))
print('Release Year = {}'.format(soup.FullMetadata.BaseObjectData.ReleaseDate.contents[0]))
print('RunTime = {}'.format(soup.FullMetadata.BaseObjectData.ApproximateLength.contents[0]))
print('EIDR ID = {}'.format(soup.FullMetadata.BaseObjectData.ID.contents[0]))

We may also run into scenarios where the Alternate Id you requested does not exist in EIDR registry. This will result in a different XML response. See below.

For more details on other status codes and operation codes, please refer to 2.1.5 Codes and Descriptions (page 9) in the EIDR Rest API document.

Considering the above responses, we need to conditionally extract the details. We need to write condition to handle both scenarios.

#check the response and take action accordingly
if soup.Response is not None:
print(soup.Response.Status.Details.contents[0])
elif soup.FullMetadata is not None:
print('Title = {}'.format(soup.FullMetadata.BaseObjectData.ResourceName.contents[0]))
print('Type = {}'.format(soup.FullMetadata.BaseObjectData.ReferentType.contents[0]))
print('Release Year = {}'.format(soup.FullMetadata.BaseObjectData.ReleaseDate.contents[0]))
print('RunTime = {}'.format(soup.FullMetadata.BaseObjectData.ApproximateLength.contents[0]))
print('EIDR ID = {}'.format(soup.FullMetadata.BaseObjectData.ID.contents[0]))

Let’s write code to read a text file with multiple Alternate Ids and get the status/details for each and export the output into an excel file.

Create a text file in a text editor and paste the below Alternate Ids or add your own Alternate Ids.Save the file as input_altids.txt

tt6723592
tt3014284
tt0434409
tt7286456
tt0120789
tt0109279
tt0070034
tt0408345
tt1915581
tt0956038
tt0327137
tt0186566
tt0811080
tt0234829
tt0120873
48d84b91-cca1-4a25-8df8-ba26b2fd4c5a
71661e5e-70d4-4ab9-b043-4ead829dcd73
288041
B0047WJ11G
2051258
34895

Below code does the following -

  1. Set’s a variable with the filename. In our case, it is input_tokens.txt.
  2. Open the file and split the tokens in a list of tokens
  3. Loop thru each token and call the functions that we created i.e. getTokenStatus and getTokenStatusDesc
  4. Save the results in a list
  5. Create a new pandas dataframe using the results list.
#set the inputfile variable with the Alternate Id filename
inputFile = 'input_altids.txt'
#open the file and split each Alternate Id into a list object
with open(inputFile) as f:
altids = f.read().splitlines()
#create an empty list to store the result/status
output_list = []
#loop thru each Alternate Id to get the result/status
for altid in altids:

#Call our getAltId function to get the result/status
eidr_altid_resp = getAltIdData(altid)

#transform the response into a readable XML object
soup = BeautifulSoup(eidr_altid_resp, 'xml')

#Check the response whether an EIDRs is available or the Alternate ID does not exist in the EIDR registry
if soup.Response is not None:
output_list.append([altid, soup.Response.Status.Details.contents[0], '', '', ''])
elif soup.FullMetadata is not None:
Title = soup.FullMetadata.BaseObjectData.ResourceName.contents[0]
Type = soup.FullMetadata.BaseObjectData.ReferentType.contents[0]
ReleaseYear = soup.FullMetadata.BaseObjectData.ReleaseDate.contents[0]
RunTime = soup.FullMetadata.BaseObjectData.ApproximateLength.contents[0]
#Save the result/status
output_list.append([altid, Title, Type, ReleaseYear, RunTime])
#create a dataframe wit the result/status list
df_output = pd.DataFrame(output_list, columns =['Alt ID', 'Title', 'Type', 'ReleaseYear', 'Runtime'])
#Show the top 5 rows of the dataframe
df_output.head()

Here is the output of the df_output.head()

Once the dataframe is created, you can export it to excel

df_output.to_excel('AlternateId_status.xlsx')

Here is the complete python code

import requests, base64, hashlib
from bs4 import BeautifulSoup
import pandas as pd
UserID = '10.5238/xxxxxxxx' # enter your EIDR User Id
Pwd = '************' # enter your EIDR password
PartyID = '10.5237/xxxxxxxxx' # enter your EIDR party ID
url = 'https://sandbox1.eidr.org:443/EIDR/' # EIDR Registry URL

#Encrypt the credentials
PasswordShadow = base64.b64encode(hashlib.md5(Pwd.encode('utf-8')).digest()).decode('utf8')
auth_str = '%s:%s:%s' % (UserID, PartyID, PasswordShadow)headers = {'Authorization' : 'Eidr {}'.format(auth_str), 'Accept': 'text/xml', 'Content-Type': 'text/xml'}def getAltIdData(altid):
req = url + 'object/?altid=' + altid
resp = requests.get(req, headers=headers)
return resp.content
#set the inputfile variable with the Alternate Id filename
inputFile = 'input_altids.txt'
#open the file and split each Alternate Id into a list object
with open(inputFile) as f:
altids = f.read().splitlines()
#create an empty list to store the result/status
output_list = []
#loop thru each Alternate Id to get the result/status
for altid in altids:

#Call our getAltId function to get the result/status
eidr_altid_resp = getAltIdData(altid)

#transform the response into a readable XML object
soup = BeautifulSoup(eidr_altid_resp, 'xml')

#Check the response whether an EIDRs is available or the Alternate ID does not exist in the EIDR registry
if soup.Response is not None:
output_list.append([altid, soup.Response.Status.Details.contents[0], '', '', ''])
elif soup.FullMetadata is not None:
Title = soup.FullMetadata.BaseObjectData.ResourceName.contents[0]
Type = soup.FullMetadata.BaseObjectData.ReferentType.contents[0]
ReleaseYear = soup.FullMetadata.BaseObjectData.ReleaseDate.contents[0]
RunTime = soup.FullMetadata.BaseObjectData.ApproximateLength.contents[0]
#Save the result/status
output_list.append([altid, Title, Type, ReleaseYear, RunTime])
#create a dataframe wit the result/status list
df_output = pd.DataFrame(output_list, columns =['Alt ID', 'Title', 'Type', 'ReleaseYear', 'Runtime'])
#Export the data into an Excel file
df_output.to_excel('AlternateId_status.xlsx')

This completes the tutorial on how to get EIDR details for multiple Alternate Ids and export to excel.

Also check out:

how to get status for multiple EIDR tokens.

--

--