BlueDolphin ODATA API with Python and Pandas
Introduction
This is the first tutorial in a small series on programmatic access to BlueDolphin's data and functionality. The goal is to show how to use BlueDolphin's ODATA API to retrieve data from Python and it explains how to perform basic analysis of BlueDolphin object data with the Pandas library.
Prepare your Python environment
This tutorial assumes that you have a functional (virtual) Python environment, with Python version 3.7 or higher. Make sure that at least the following Python packages are installed. Please note that, depending on your environment, there may be additional packages that require installation.
- The
requests
package is the de facto standard for making HTTP requests in Python. - The
pandas
package provides Data Analysis capability to Python. According to Wes McKinney, the creator of Pandas, its name is derived from 'panel data' which refers to multi-dimensional structured data sets. Conceptually, it offers capabilities similar to theR
language. - The
matplotlib
package is a comprehensive library for creating static, animated, and interactive visualizations in Python.
The following statements are an example of how to install these Python packages in a Windows environment. Installation for a Linux environment will be similar.
py -m pip install pip --upgrade py -m pip install requests py -m pip install pandas py -m pip install matplotlib
Process and visualize the data
Import the required modules
import requests import pandas as pd import matplotlib.pyplot as plt
Configure the API credentials to use
The credentials for the ODATA API are shown in the General Information page of the BlueDolphin user interface, which you can find by navigating the menus Admin
→ System
→ General
. On that page, the username to use is shown as Database
(which is also your tenant name) and the password is shown as Data Collection API Key
.
# Define the username and password to use for the API call username = '<database name>' password = '<api key>'
Request BlueDolphin objects using the ODATA API
The following code fragment calls the ODATA API, retrieves all objects in the BlueDolphin repository and stores the response in the response
variable.
The response JSON looks like the following fragment, with the relevant data contained in the value
array:
{ "@odata.context": "https://martijnschiedon.odata.bluedolphin.valueblue.nl/$metadata#Objects(ID,Title,Definition,Category)", "value": [ { "ID": "568a7057bb1d7106c8b887fe", "Title": "Bedrijfsproces", "Definition": "Bedrijfsproces", "Category": "Bedrijfslaag" }, { "ID": "5db71b1bcf15092a5493f3f7", "Title": "Verkoop product", "Definition": "Bedrijfsproces", "Category": "Bedrijfslaag" }, ... { "ID": "5ffc6d74ad3fbf340417361e", "Title": "Voertuig", "Definition": "Bedrijfsobject", "Category": "Bedrijfslaag" } ] }
Please note that:
- Only the specified fields
ID
,Title
,Definition
andCategory
will be queried and returned in the response. This is more efficient. - The API connection uses the new URL format which now includes the tenant's name (
<tenant>.odata.bluedolphin.valueblue.nl
). This is useful when working with multiple tenants and stored credentials in applications such as Power BI or Excel, e.g. supporting simultaneous connections to bothtest
andproduction
environments.
For more information about the ODATA 2.0 URI conventions, see this article.
# Request objects by calling the BlueDolphin's ODATA API response = requests.get( url='https://martijnschiedon.odata.bluedolphin.valueblue.nl/Objects?$select=ID,Title,Definition,Category', auth=(username, password) )
Normalize the JSON response and create a Pandas DataFrame
The following code fragment converts the API's JSON response into a Pandas DataFrame object, so that it can be manipulated further.
# Convert every object in the 'value' array to a DataFrame row df = pd.json_normalize(data=response.json(), record_path='value') # Show a small data sample df.head(10)
ID | Title | Definition | Category | |
---|---|---|---|---|
0 | 568a7057bb1d7106c8b887fe | Bedrijfsproces | Bedrijfsproces | Bedrijfslaag |
1 | 5db71b1bcf15092a5493f3f7 | Verkoop product | Bedrijfsproces | Bedrijfslaag |
2 | 5db71b1bcf15092a5493f417 | Verkoop dienst | Bedrijfsproces | Bedrijfslaag |
3 | 568a6f7a126c16080c5f0241 | Actor | Actor | Bedrijfslaag |
4 | 5d9ba53ccf150905aca84fd3 | Klant | Actor | Bedrijfslaag |
5 | 5db76b47145c10438cbe35a9 | Afdeling verkoop | Actor | Bedrijfslaag |
6 | 5edf58a9145c1002381f4e9b | Leverancier A | Actor | Bedrijfslaag |
7 | 5edf58b6bbd0ab18a09c0272 | Leverancier B | Actor | Bedrijfslaag |
8 | 568ce5a5bb1d7102846e834b | Externe actor | Actor | Bedrijfslaag |
9 | 568ce5e0b4472d03701ec7df | Externe actor | Actor | Bedrijfslaag |
Show the categories of BlueDolphin objects in use
We select distinct category values from the list of objects. These category relates to the ArchiMate layer or extension the object is a member of.
# Use an interator and f-string to pretty print the categories for category in df['Category'].unique(): print(f"- {category}")
- Bedrijfslaag - Standaard - Applicatielaag - Technologielaag - Motivation Extension - Logical Data Dictionary
Clean up each object's category description
Note that the results of the previous statement returned the category values in the Dutch language, because the language setting of the BlueDolphin tenant is set to Dutch. Therefore, we will map and standardize these category values according to the following table:
Category | Standardized Category |
---|---|
Bedrijfslaag | Business |
Applicatielaag | Application |
Technologielaag | Technology |
Logical Data Dictionary | Data |
Other categories | Other |
# Define a sneaky mapping of the first character of the category to the standardized category category_map = { 'B': 'Business', 'A': 'Application', 'T': 'Technology', 'L': 'Data', } # Use a lambda function to map the categories using a default of 'Other' for categories not in the map df['Standardized Category'] = df['Category'].map(lambda x: category_map.get(x[0], 'Other')) # Show an example of each of the categories being mapped df.drop_duplicates(subset=['Category', 'Standardized Category'])
ID | Title | Definition | Category | Standardized Category | |
---|---|---|---|---|---|
0 | 568a7057bb1d7106c8b887fe | Bedrijfsproces | Bedrijfsproces | Bedrijfslaag | Business |
11 | 568a70612a4e9b04782ff8fd | Locatie | Locatie | Standaard | Other |
14 | 57cd6315104f0806ac3f53a7 | Koppeling | Koppeling | Applicatielaag | Application |
48 | 568a715bbb1d7106c8b888c9 | Server | Server | Technologielaag | Technology |
236 | 5de6577b145c101cf4881ed9 | Optimale doorstroming | Doel | Motivation Extension | Other |
237 | 5db6c8c5145c10438cb6b801 | Nummer | Dataverzameling Verkoop | Logical Data Dictionary | Data |
Sort the data
Note that sorting the data is not required and will take considerable time with large data sets. However, for this tutorial it results in a more consistent experience in places where data samples are shown. The chosen sort order is Standardized Category
, Definition
and Title
.
# Sort the values in place df.sort_values(by=['Standardized Category', 'Definition', 'Title'], inplace=True) # Show a small data sample df.head(10)
ID | Title | Definition | Category | Standardized Category | |
---|---|---|---|---|---|
34 | 5d9b9131145c101c04350d36 | Adlib | Applicatiecomponent | Applicatielaag | Application |
43 | 5db6d16dc572da1b5017c415 | AfterPay | Applicatiecomponent | Applicatielaag | Application |
45 | 57c998f23e3e8e0608434e01 | Applicatie 1 | Applicatiecomponent | Applicatielaag | Application |
46 | 57c998f23e3e8e0608434e06 | Applicatie 2 | Applicatiecomponent | Applicatielaag | Application |
47 | 57c998f23e3e8e0608434e0b | Applicatie 3 | Applicatiecomponent | Applicatielaag | Application |
32 | 568a70f1126c16080c5f05f7 | Applicatiecomponent | Applicatiecomponent | Applicatielaag | Application |
44 | 568a7067b4472d09a82f1398 | Applicatiecomponent | Applicatiecomponent | Applicatielaag | Application |
37 | 5d9b9132145c101c04350d8d | Collectiedata Services | Applicatiecomponent | Applicatielaag | Application |
39 | 5d9ba63abbd0ab18985c3906 | File Share | Applicatiecomponent | Applicatielaag | Application |
35 | 5d9b9132145c101c04350d53 | Join | Applicatiecomponent | Applicatielaag | Application |
Create a statistics DataFrame with the total number of objects in each standardized category
# Group rows by the Standardized Category and count the number of object IDs within each group object_count_per_category = df.groupby(by='Standardized Category')[['ID']].count() # Change the column name to 'Object Count' object_count_per_category.rename(columns={'ID': 'Object Count'}, inplace=True) # Show the entire data set object_count_per_category.iloc[:]
Object Count | |
---|---|
Standardized Category | |
Application | 46 |
Business | 17 |
Data | 60 |
Other | 4 |
Technology | 183 |
Visualize the total number of BlueDolphin objects per standardized category
# Determine x categories with the highest object count (where x=1 in this example) top_counts = object_count_per_category['Object Count'].nlargest(1) # Prepare the list of pie chart wedges to highlight (or 'explode' in matplotlib terms) explode = [.15 if c else 0 for c in object_count_per_category['Object Count'].isin(top_counts)] # Prepare the labels to use for the pie chart wedges (consisting of the category and object count) labels = object_count_per_category.apply(lambda x: f"{x.name}\n({x[0]} objects)", axis=1) # Display the pie chart pie = object_count_per_category.plot( kind='pie', figsize=(16, 6), labels=labels, x='Standardized Category', xlabel='', y='Object Count', ylabel='', autopct='%1.1f%%', explode=explode, legend=False, shadow=False, textprops=dict(color='black', fontweight='normal', fontsize=12), wedgeprops=dict(width=0.6), labeldistance=1.1, pctdistance=0.7, rotatelabels=False, subplots=False, ) # Set a title for good measure plt.title('Total number of BlueDolphin objects per standardized category', fontweight='bold', fontsize=16)

Create the object_count_per_definition
DataFrame
This DataFrame is created through grouping the data by the Standardized Category
and Definition
fields, where the field Object Count
contains the number of BlueDolphin objects in the repository per each object definition.
The Standardized Category
column needs to be included in the group because the column's values need to survive the aggregation performed by count()
. Simply put, the column should keep its original values.
Please note that, unlike the previous grouping example, the columns used for grouping the data are not converted to the DataFrame's index. Occasionally, this makes using functions like DataFrame.pivot()
simpler, but is often not necessary.
# Group the data by the categories and object definitions and count the total number of object for each group object_count_per_definition = df.groupby(by=['Standardized Category', 'Definition'], as_index=False).count().iloc[:, :3] # Rename the 'ID' column to 'Object Count' object_count_per_definition.rename(columns={'ID': 'Object Count'}, inplace=True) # Show a sample of the data object_count_per_definition.head(10)
Standardized Category | Definition | Object Count | |
---|---|---|---|
0 | Application | Applicatiecomponent | 16 |
1 | Application | Applicatiefunctie | 1 |
2 | Application | Data-object | 16 |
3 | Application | Ede Data Object | 6 |
4 | Application | Koppeling | 4 |
5 | Application | MN Processen Data Object | 2 |
6 | Application | Wegdomein Gegevensobject | 1 |
7 | Business | Actor | 8 |
8 | Business | Bedrijfsobject | 4 |
9 | Business | Bedrijfsproces | 3 |
Visualize the number of BlueDolphin objects per object definition
This chart shows the number of BlueDolphin objects for each object definition as horizontal bars, with the largest bar at the top and the smallest bar at the bottom. It also shows the standardized category in the colors similar to those used in the ArchiMate specification.
Please note that the horizontal axis uses a logarithmic scale.
# Sort the dataset by the number of objects object_count_per_definition.sort_values(by='Object Count', ascending=True, inplace=True) # Dictionary to map the standardized categories to colors color_map = { 'Application': 'deepskyblue', 'Business': 'gold', 'Technology': 'yellowgreen', 'Data': 'red', } # Build a list of the colors to use for each bar colors = object_count_per_definition['Standardized Category'].map(lambda x: color_map.get(x, 'grey')) # Plot the gr object_count_per_definition.plot( kind='barh', figsize=(11, 6), x='Definition', xlabel='', y='Object Count', color=colors, fontsize=12, subplots=True, logx=True, legend=False, ) # Set a title for good measure plt.title('Use of BlueDolphin object definitions (logarithmic scale)\n', fontweight='bold', fontsize=16)

Conclusion
This tutorial has introduced the basics of data analysis with BlueDolphin's ODATA API, Python and Pandas. With the information provided, you can start discovering your own applications of this incredibly useful feature. Since the ODATA API also supports retrieving Relation
-data from the BlueDolphin architecture repository, analyzing the relations and related objects might very well be the subject of a future tutorial.
Happy programming!
Please sign in to leave a comment.
Comments
0 comments