Skip to content

Python script

How to Create a file for multiple datafiles remote execution

This code is an implementation of the main.py. It demonstrates how to read data from files, process it, and generate results in different formats.

In this example, we will have this directory structure:

|-- utils/
| |-- cluster.py
| |-- myfile.py
| |-- test/
| | |-- test.txt
|-- main.py
|-- requirements.txt
  • The main.py and requirements.txt files are mandatory and must have these specific filenames.
  • Other files can be added to use different Python modules; in this example, we are using the utils directory. These modules can have any name and directory structure, as long as they are correctly loaded in the main.py file.
  • The requirements.txt file should list required library versions for execution. All of them must be compatible with Python 3.8.

Import Required Libraries

There is no specific libraries that should be imported. You can import any library for your own use: e.g. pandas. And, also import other python modules from your files.

import pandas as pd
from utils.cluster import clustering
from utils.myfile import test_function

Import Data Files (shareable samples)

Each data file should be downloaded and stored in a folder named after its ID (the data file’s UUID can be found in its metadata) in the same directory with the main.py script.

Open View. Open View.

You can copy the dataset ID from the dataset details page on the RAISE portal.

Open View.

Read Data Files

The file path for loading data should follow this format:

datafile_uuid = "00000000-0000-0000-0000-000000000000"
datafile_extension = "csv"
file_path = f"{datafile_uuid}/datafile.{datafile_extension}"

Inside this directory, the data file must always be named datafile with the appropriate extension: e.g., .csv, .txt, .edf, .json… Replace datafile_uuid and datafile_extension with the actual values for your dataset. You can repeat this process to read multiple datafiles. Use different functions to load data depending on the data type. For example, if you have a CSV file, you can use the read_csv from pandas.

data = pd.read_csv(file_path)

Run the Code

In this section, you can add the code to process the data. The code will depend on the specific task you want to perform (e.g., clustering, analysis, etc.). Here’s an example:

clustering_results = clustering(data)

Gather Results

Any type of file can be taken as a result -> images, csv, text… Moreover, the number of results is not limited. It is worth noting that the results must now be stored under the “results” directory (it is at the same level as the main file execution).

clustered_data.to_csv("results/clustered_data.csv")
reduced_clustered_data.to_csv("results/reduced_clustered_data.csv")

Complete main.py script

import pandas as pd
from utils.cluster import cluster_data,plot_clusters_figure
from utils.myfile import test_function
#Read data
#Please, do not edit this line of code.
data = pd.read_csv("08dd4107-1d39-4ef9-8233-768eabac6ca6/datafile.csv")
test_df = pd.read_csv("08dd4105-c685-4884-8080-049406dda784/datafile.csv")
result = test_function()
print(result)
#Code to run
#Add here the code to run. You can create different functions and the results should be clear stated.
clustered_data, reduced_clustered_data, cluster_centers = cluster_data(data=data)
clusters_figure = plot_clusters_figure(data=reduced_clustered_data, centroids=cluster_centers)
os.makedirs("results", exist_ok=True)
clustered_data.to_csv("results/clustered_data.csv")
reduced_clustered_data.to_csv("results/reduced_clustered_data.csv")

Logs

Finally, the log system has been improved. In the event that the experiment does not fail, the user’s “prints()” in the script will be logged. In the case of an error, the logs save the exact error that caused the execution to fail.
In case the main.py execution fails, you will be able to see the exact reason of the failure (wrongly defined variables, unexpected indentations…).
In the case where the creation of the child container is not successful the logs will contain the reason for the failure (incompatible versions in the requirements txt, non-existing package versions…).\

You can find some examples at the templates section.