Python script
How to Create a file for multiple datafiles remote execution
This code is an implementation of the main.py. It demonstrates how to read data from files, process it, and generate results in different formats.
In this example, we will have this directory structure:
|-- utils/| |-- cluster.py| |-- myfile.py| |-- test/| | |-- test.txt|-- main.py|-- requirements.txt
- The
main.py
andrequirements.txt
files are mandatory and must have these specific filenames. - Other files can be added to use different Python modules; in this example, we are using the
utils
directory. These modules can have any name and directory structure, as long as they are correctly loaded in themain.py
file. - The
requirements.txt
file should list required library versions for execution. All of them must be compatible withPython 3.8
.
Import Required Libraries
There is no specific libraries that should be imported. You can import any library for your own use: e.g. pandas
. And, also import other python modules from your files.
import pandas as pdfrom utils.cluster import clusteringfrom utils.myfile import test_function
Import Data Files (shareable samples)
Each data file should be downloaded and stored in a folder named after its ID (the data file’s UUID can be found in its metadata) in the same directory with the main.py script.
You can copy the dataset ID from the dataset details page on the RAISE portal.
Read Data Files
The file path for loading data should follow this format:
datafile_uuid = "00000000-0000-0000-0000-000000000000"datafile_extension = "csv"file_path = f"{datafile_uuid}/datafile.{datafile_extension}"
Inside this directory, the data file must always be named datafile
with the appropriate extension: e.g., .csv
, .txt
, .edf
, .json
…
Replace datafile_uuid
and datafile_extension
with the actual values for your dataset. You can repeat this process to read multiple datafiles.
Use different functions to load data depending on the data type. For example, if you have a CSV file, you can use the read_csv
from pandas
.
data = pd.read_csv(file_path)
Run the Code
In this section, you can add the code to process the data. The code will depend on the specific task you want to perform (e.g., clustering, analysis, etc.). Here’s an example:
clustering_results = clustering(data)
Gather Results
Any type of file can be taken as a result -> images, csv, text… Moreover, the number of results is not limited. It is worth noting that the results must now be stored under the “results” directory (it is at the same level as the main file execution).
clustered_data.to_csv("results/clustered_data.csv")reduced_clustered_data.to_csv("results/reduced_clustered_data.csv")
Complete main.py script
import pandas as pd
from utils.cluster import cluster_data,plot_clusters_figurefrom utils.myfile import test_function
#Read data#Please, do not edit this line of code.data = pd.read_csv("08dd4107-1d39-4ef9-8233-768eabac6ca6/datafile.csv")test_df = pd.read_csv("08dd4105-c685-4884-8080-049406dda784/datafile.csv")result = test_function()print(result)#Code to run#Add here the code to run. You can create different functions and the results should be clear stated.clustered_data, reduced_clustered_data, cluster_centers = cluster_data(data=data)clusters_figure = plot_clusters_figure(data=reduced_clustered_data, centroids=cluster_centers)
os.makedirs("results", exist_ok=True)clustered_data.to_csv("results/clustered_data.csv")reduced_clustered_data.to_csv("results/reduced_clustered_data.csv")
Logs
Finally, the log system has been improved. In the event that the experiment does not fail, the user’s “prints()” in the script will be logged. In the case of an error, the logs save the exact error that caused the execution to fail.
In case the main.py execution fails, you will be able to see the exact reason of the failure (wrongly defined variables, unexpected indentations…).
In the case where the creation of the child container is not successful the logs will contain the reason for the failure (incompatible versions in the requirements txt, non-existing package versions…).\
You can find some examples at the templates section.