src package
Submodules
src.io_functions module
- src.io_functions.add_attachment_to_project(username: str, password: str, host: str, port: int, project_id: int, file_path: str) None [source]
This function adds an attachment to a project on an OMERO server. It takes the username, password, host, port, project ID, and file path as input.
Parameters: username (str): The username. password (str): The password. host (str): The host address. port (int): The port number. project_id (int): The project ID. file_path (str): The file path.
- src.io_functions.auto_insert_fs_storage_to_database()[source]
This function automatically inserts file storage data into the database. It retrieves project list and file dictionary from a netstore, then iterates over each project. For each project, it inserts object data into the database and retrieves the object id. Then, it iterates over each file in the project, retrieves file stats, and inserts them into the fs_storage table. It also extracts tags from the file path, cleans and converts them, and inserts them into the tag table. Finally, it removes duplicate objects and tags from the database.
- src.io_functions.auto_insert_omero_to_database()[source]
This function automatically inserts OMERO data into the database. It establishes a connection to the OMERO server, retrieves all projects, and iterates over each project. For each project, it extracts project data, inserts it into the database, and iterates over each tag in the project, extracting and processing tag data and inserting it into the database. It then iterates over each dataset in the project, extracts dataset data, inserts it into the database, and iterates over each tag in the dataset, extracting and processing tag data and inserting it into the database. It then iterates over each image in the dataset, extracts image data, inserts it into the database, and iterates over each tag in the image, extracting and processing tag data and inserting it into the database. Finally, it removes duplicate objects and tags from the database.
- src.io_functions.calculate_binary_overlap(df1, col1, df2, col2, epsilon)[source]
This function calculates the binary overlap between two columns of two DataFrames based on a given epsilon value. It takes two DataFrames, two column names, and an epsilon value as input and returns a DataFrame containing the binary overlap results.
Parameters: df1 (pd.DataFrame): The first DataFrame. col1 (str): The name of the column in the first DataFrame. df2 (pd.DataFrame): The second DataFrame. col2 (str): The name of the column in the second DataFrame. epsilon (float): The maximum time difference allowed for a match.
Returns: pd.DataFrame: A DataFrame containing the binary overlap results.
- src.io_functions.calculate_percentage(str1: str, str2: str) float [source]
This function calculates the percentage of common characters between two strings. It takes two strings as input and returns the percentage of common characters.
Parameters: str1 (str): The first string. str2 (str): The second string.
Returns: float: The percentage of common characters.
- src.io_functions.check_db_table(db_name: str, table_name: str) None [source]
This function checks if a table exists in a SQLite database. If the table does not exist, it creates the table using the SQL code from the get_create_table_sql() function.
Parameters: db_name (str): The name of the SQLite database. table_name (str): The name of the table to check.
- src.io_functions.check_for_omero_entries(db_name, omero_name, object_type)[source]
This function checks for existing entries in the OMERO source of a SQLite database based on the specified OMERO name and object type. It takes a database name, an OMERO name, and an object type as input and returns a DataFrame containing the retrieved data.
Parameters: db_name (str): The name of the SQLite database. omero_name (str): The OMERO name to check for. object_type (str): The object type to check for.
Returns: pd.DataFrame: A DataFrame containing the retrieved data.
- src.io_functions.check_if_entry_exists(db_name: str, table_name: str, object_document: Dict[str, any]) int [source]
This function checks if a specific entry exists in a SQLite database table. It takes the database name, table name, and a dictionary containing the specific ID as input. The function returns 1 if the entry exists and 0 otherwise.
Parameters: db_name (str): The name of the SQLite database. table_name (str): The name of the table to check. object_document (Dict[str, any]): A dictionary containing the specific ID.
Returns: int: 1 if the entry exists, 0 otherwise.
- src.io_functions.convert_abbreviation(og_string: str) str [source]
This function converts a string of abbreviations into a string of full forms using a dictionary of abbreviations. It takes a string as input and returns the updated string.
Parameters: og_string (str): The input string containing the abbreviations.
Returns: str: The updated string containing the full forms.
- src.io_functions.convert_abbreviations(df: DataFrame, colname: str = 'tags') DataFrame [source]
This function replaces abbreviations in a DataFrame column with their full forms using a dictionary of abbreviations. It takes a DataFrame and a column name as input and returns the updated DataFrame.
Parameters: df (pd.DataFrame): The input DataFrame. colname (str): The name of the column containing the tags. Default is “tags”.
Returns: pd.DataFrame: The updated DataFrame.
- src.io_functions.convert_omero_timestamp(timestamp)[source]
This function converts a timestamp from OMERO format to a standard format.
Parameters: timestamp (str): The timestamp in OMERO format.
Returns: str: The timestamp in standard format.
- src.io_functions.convert_rspace_timestamp(timestamp)[source]
This function converts a timestamp from RSpace format to a standard format.
Parameters: timestamp (str): The timestamp in RSpace format.
Returns: str: The timestamp in standard format.
- src.io_functions.copy_author_from_projectcolumn(df: DataFrame) DataFrame [source]
This function takes a DataFrame as input and copies the author information from the project column to the author column. The author information is expected to be present in the project column in the format “Project Name (Author Name)”. The function extracts the author name from the project column and appends it to the author column. If the author column already contains some information, the new author name is appended to it. The function also removes any duplicates from the author column and returns the updated DataFrame.
Parameters: df (pd.DataFrame): The input DataFrame.
Returns: pd.DataFrame: The updated DataFrame.
- src.io_functions.create_bulk_import_file(id, path, docker_vol_path, filename='./data/image_import_files.csv')[source]
This function creates a bulk import file for OMERO.
Parameters: id (str): The ID of the dataset. path (str): The path to the image file. docker_vol_path (str): The path to the Docker volume. filename (str): The name of the bulk import file. Default is ‘./data/image_import_files.csv’.
Returns: int: 0 if the operation is successful.
- src.io_functions.create_link(high_object, low_object, type='tag')[source]
This function creates a link between two objects in OMERO.
Parameters: high_object (object): The parent object. low_object (object): The child object. type (str): The type of the link. Default is ‘tag’.
- src.io_functions.create_object(project_name: str, type: str) object [source]
This function creates a new project or dataset in OMERO.
Parameters: project_name (str): The name of the project or dataset. type (str): The type of the object. It can be ‘Project’ or ‘Dataset’.
Returns: object: The newly created project or dataset object.
- src.io_functions.create_rspace_document(doc_name, content, parent_folder_id, tags='', apikey='e7HV1YLT8BvhM2dSoVvzIyVK0svNexJq')[source]
This function creates a new document in RSpace.
Parameters: doc_name (str): The name of the document. content (str): The content of the document. parent_folder_id (int): The ID of the parent folder. tags (str): The tags for the document. Default is an empty string. apikey (str): The API key. Default is ‘e7HV1YLT8BvhM2dSoVvzIyVK0svNexJq’.
Returns: str: The output of the command.
- src.io_functions.create_rspace_document_header(egroupware_project_name, fs_storage_project_folder, overlap_ratio)[source]
This function creates a header for an RSpace document.
Parameters: egroupware_project_name (str): The name of the egroupware project. fs_storage_project_folder (str): The name of the fs_storage project folder. overlap_ratio (float): The overlap ratio.
Returns: str: The header.
- src.io_functions.create_rspace_files_table(header, tar_id, db_name='sync_database.db')[source]
This function creates a table of files for RSpace.
Parameters: header (str): The header of the table. tar_id (int): The ID of the tar file. db_name (str): The name of the database. Default is ‘sync_database.db’.
Returns: list: A list of files.
- src.io_functions.create_rspace_folder(folder_name, apikey='e7HV1YLT8BvhM2dSoVvzIyVK0svNexJq')[source]
This function creates a new folder in RSpace.
Parameters: folder_name (str): The name of the folder. apikey (str): The API key. Default is ‘e7HV1YLT8BvhM2dSoVvzIyVK0svNexJq’.
Returns: str: The output of the command.
- src.io_functions.create_system_overlap() DataFrame [source]
This function creates a DataFrame containing the overlap between egroupware and netstore DataFrames based on project, author, and tags columns. It returns the DataFrame containing the overlap.
Returns: pd.DataFrame: A DataFrame containing the overlap between egroupware and netstore DataFrames.
- src.io_functions.create_tag(tag_name, tag_description) object [source]
This function creates a new tag in OMERO.
Parameters: tag_name (str): The name of the tag. tag_description (str): The description of the tag.
Returns: object: The newly created tag object.
- src.io_functions.delete_duplicate_links(db_name='./data/sync_database.db')[source]
This function deletes duplicate links from a SQLite database table. It takes the database name as input.
Parameters: db_name (str): The name of the SQLite database. Default is ‘./data/sync_database.db’.
- src.io_functions.delete_duplicate_objects(db_name: str = './data/sync_database.db') int [source]
This function deletes duplicate objects from a SQLite database table. It takes the database name as input and returns 0 if the operation is successful.
Parameters: db_name (str): The name of the SQLite database. Default is ‘./data/sync_database.db’.
Returns: int: 0 if the operation is successful.
- src.io_functions.delete_duplicate_tags(db_name: str = './data/sync_database.db') int [source]
This function deletes duplicate tags from a SQLite database table. It takes the database name as input and returns 0 if the operation is successful.
Parameters: db_name (str): The name of the SQLite database. Default is ‘./data/sync_database.db’.
Returns: int: 0 if the operation is successful.
- src.io_functions.delete_duplicates_bulk_import(bulk_file_src_path)[source]
This function removes duplicate rows from a CSV file containing a list of files/folders to be imported into OMERO. It takes the source file path as input.
Parameters: bulk_file_src_path (str): The source file path.
- src.io_functions.generate_html_table(files=[{'access': 'manuel', 'metadata': 'empty', 'name': 'file1.xls', 'path': '/workspace/editor/structuredDocument/file1.xls', 'type': 'sheet'}, {'access': 'manuel', 'metadata': 'empty', 'name': 'file2.pdf', 'path': '/workspace/editor/structuredDocument/file2.pdf', 'type': 'pdf'}, {'access': 'omero', 'metadata': 'empty', 'name': 'file3.tif', 'path': '/workspace/editor/structuredDocument/file3.tif', 'type': 'image (tif)'}, {'access': 'manuel', 'metadata': 'empty', 'name': 'file4.tar', 'path': '/workspace/editor/structuredDocument/file4.tar', 'type': 'archive (tar)'}], header='')[source]
This function generates an HTML table.
Parameters: files (list): A list of files. Default is the output of get_placeholder_files_for_rspace(). header (str): The header of the table. Default is an empty string.
Returns: str: The HTML table.
- src.io_functions.get_abbreviation_dict() dict [source]
This function loads the abbreviations JSON file and returns the data as a dictionary.
Returns: dict: The abbreviations data as a dictionary.
- src.io_functions.get_cleaned_tag_string(input_string: str) str [source]
This function takes a string as input and cleans it by replacing certain separators with semicolons, removing duplicates, and removing stopwords. The function returns the cleaned string.
Parameters: input_string (str): The input string.
Returns: str: The cleaned string.
- src.io_functions.get_col_overlap_df(df1: DataFrame, df2: DataFrame, colname1: str, colname2: str) DataFrame [source]
This function calculates the percentage of common characters between two columns of two DataFrames. It takes two DataFrames and two column names as input and returns a DataFrame containing the combinations of values and their corresponding percentage of common characters.
Parameters: df1 (pd.DataFrame): The first DataFrame. df2 (pd.DataFrame): The second DataFrame. colname1 (str): The name of the column in the first DataFrame. colname2 (str): The name of the column in the second DataFrame.
Returns: pd.DataFrame: A DataFrame containing the combinations of values and their corresponding percentage of common characters.
- src.io_functions.get_create_table_sql() dict [source]
This function loads the database create SQL JSON file and returns the data as a dictionary.
Returns: dict: The database create SQL data as a dictionary.
- src.io_functions.get_current_username()[source]
This function retrieves the current username.
Returns: str: The current username.
- src.io_functions.get_dataframe(df_type, db_name='sync_database.db')[source]
This function retrieves data from a SQLite database based on the specified data type. It takes a data type and a database name as input and returns a DataFrame containing the retrieved data.
Parameters: df_type (str): The type of data to retrieve. db_name (str, optional): The name of the SQLite database. Defaults to ‘sync_database.db’.
Returns: pd.DataFrame: A DataFrame containing the retrieved data.
- src.io_functions.get_dataset_fs_storage_name(db_name, base_path)[source]
This function retrieves data from the fs_storage source of a SQLite database based on the specified base path. It takes a database name and a base path as input and returns a DataFrame containing the retrieved data.
Parameters: db_name (str): The name of the SQLite database. base_path (str): The base path to retrieve data for.
Returns: pd.DataFrame: A DataFrame containing the retrieved data.
- src.io_functions.get_description_dict() dict [source]
This function loads the descriptions JSON file and returns the data as a dictionary.
Returns: dict: The descriptions data as a dictionary.
- src.io_functions.get_df_if_exist(file_path: str) DataFrame [source]
This function takes a file path as input and checks if a file exists at that path. If the file exists, it loads the file into a DataFrame and returns the DataFrame. If the file does not exist, it returns None.
Parameters: file_path (str): The path to the file.
Returns: pd.DataFrame: The DataFrame if the file exists, None otherwise.
- src.io_functions.get_df_name(var: DataFrame) str [source]
This function retrieves the name of a DataFrame variable. It takes a DataFrame as input and returns the name of the variable.
Parameters: var (pd.DataFrame): The DataFrame variable.
Returns: str: The name of the DataFrame variable.
- src.io_functions.get_egroupware_data(db_name: str, project_id: int) DataFrame [source]
This function retrieves egroupware data for a specific project ID from a SQLite database. It takes a database name and a project ID as input and returns a DataFrame containing the egroupware data.
Parameters: db_name (str): The name of the SQLite database. project_id (int): The project ID.
Returns: pd.DataFrame: A DataFrame containing the egroupware data.
- src.io_functions.get_file_list(folder)[source]
Traverse the specified directory and its subdirectories to collect all the file paths.
- Parameters:
folder – The path of the directory to traverse.
- Returns:
A list of all the file paths.
- src.io_functions.get_file_stats(file_path)[source]
This function retrieves file statistics such as name, extension, size, creation and modification timestamps.
Parameters: file_path (str): The path of the file.
Returns: dict: A dictionary containing the file statistics.
- src.io_functions.get_filelist_from_database(tar_id, db_name='sync_database.db')[source]
This function retrieves a file list from a SQLite database based on the specified target ID. It takes a target ID and a database name as input and returns a DataFrame containing the file list.
Parameters: tar_id (int): The target ID to retrieve the file list for. db_name (str, optional): The name of the SQLite database. Defaults to ‘sync_database.db’.
Returns: pd.DataFrame: A DataFrame containing the file list.
- src.io_functions.get_int_from_date(date_str: str) int [source]
This function converts a date string in the format “YYYY-MM-DD” to an integer representing the number of days since the Unix epoch.
Parameters: date_str (str): The date string in the format “YYYY-MM-DD”.
Returns: int: The number of days since the Unix epoch.
- src.io_functions.get_int_timestamp_from_iso(iso_timestamp: str) int [source]
This function takes an ISO timestamp as input and converts it to an integer timestamp.
Parameters: iso_timestamp (str): The ISO timestamp.
Returns: int: The integer timestamp.
- src.io_functions.get_iso_timestamp_from_int(int_timestamp: int) str [source]
This function takes an integer timestamp as input and converts it to an ISO timestamp.
Parameters: int_timestamp (int): The integer timestamp.
Returns: str: The ISO timestamp.
- src.io_functions.get_link_object_from_id(db_name, table_name, id)[source]
This function retrieves data from a SQLite database based on the specified table name and ID. It takes a database name, a table name, and an ID as input and returns a DataFrame containing the retrieved data.
Parameters: db_name (str): The name of the SQLite database. table_name (str): The name of the table to retrieve data from. id (int): The ID to retrieve data for.
Returns: pd.DataFrame: A DataFrame containing the retrieved data.
- src.io_functions.get_netstore_data(db_name: str, object_id: int, source: str = 'fs_storage') DataFrame [source]
This function retrieves netstore data for a specific object ID from a SQLite database. It takes a database name, an object ID, and a source as input and returns a DataFrame containing the netstore data.
Parameters: db_name (str): The name of the SQLite database. object_id (int): The ID of the object to retrieve data for. source (str, optional): The source of the object data. Defaults to ‘fs_storage’.
Returns: pd.DataFrame: A DataFrame containing the netstore data for the specified object ID.
- src.io_functions.get_netstore_filelist(folder='/home/omero-import')[source]
Organize the file paths into a dictionary based on the project names found in the file paths.
- Parameters:
folder – The path of the directory to traverse.
- Returns:
A tuple containing a list of project names and a dictionary where each key is a project name and the value is a list of file paths belonging to that project.
- src.io_functions.get_object_by_id(object_name, project_id) object [source]
This function retrieves an object from OMERO based on its name and ID.
Parameters: object_name (str): The name of the object. project_id (int): The ID of the object.
Returns: object: The retrieved object if found, otherwise None.
- src.io_functions.get_object_by_name(object_name, object_class='Project') object [source]
This function retrieves an object from OMERO based on its name and class.
Parameters: object_name (str): The name of the object. object_class (str): The class of the object. Default is ‘Project’.
Returns: object: The ID of the retrieved object if found, otherwise None.
- src.io_functions.get_object_id_from_netstore_name(db_name: str, table_name: str, netstore_name: str) int | None [source]
This function retrieves the object ID from a NetStore name in a SQLite database table. It takes the database name, table name, and NetStore name as input and returns the object ID if it exists, or None otherwise.
Parameters: db_name (str): The name of the SQLite database. table_name (str): The name of the table to search in. netstore_name (str): The NetStore name to search for.
Returns: Optional[int]: The object ID if it exists, or None otherwise.
- src.io_functions.get_object_id_from_specific_id(db_name: str, table_name: str, specific_id: str) int | None [source]
This function retrieves the object ID from a specific ID in a SQLite database table. It takes the database name, table name, and specific ID as input and returns the object ID if it exists, or None otherwise.
Parameters: db_name (str): The name of the SQLite database. table_name (str): The name of the table to search in. specific_id (str): The specific ID to search for.
Returns: Optional[int]: The object ID if it exists, or None otherwise.
- src.io_functions.get_placeholder_files_for_rspace()[source]
This function returns placeholder files for RSpace.
Returns: list: A list of placeholder files.
- src.io_functions.get_possible_tags_list(tags: List[str]) List[str] [source]
This function filters a list of tags based on certain patterns to remove irrelevant or invalid tags. It takes a list of tags as input and returns the filtered list of tags.
Parameters: tags (List[str]): The list of tags to filter.
Returns: List[str]: The filtered list of tags.
- src.io_functions.get_project_timestamps_from_fs_storage(db_name: str, object_id: int) Tuple[int, int] [source]
This function retrieves the minimum created timestamp and the maximum timestamp between created and modified timestamps for a specific object ID in the fs_storage table of a SQLite database. It takes the database name and object ID as input and returns a tuple containing the minimum created timestamp and the maximum timestamp.
Parameters: db_name (str): The name of the SQLite database. object_id (int): The object ID.
Returns: Tuple[int, int]: A tuple containing the minimum created timestamp and the maximum timestamp.
- src.io_functions.get_project_user_tuple(pm_title: str) Tuple[str, str] [source]
This function takes a project title as input and extracts the project name and user information from it. The project name is expected to be present before the opening parenthesis in the project title. The user information is expected to be present between the opening and closing parentheses in the project title. The function returns a tuple containing the project name and user information. If there is no user information present in the project title, the function returns the project name and an empty string.
Parameters: pm_title (str): The project title.
Returns: Tuple[str, str]: A tuple containing the project name and user information.
- src.io_functions.get_remaining_path(full_path: str, base_path: str) str [source]
This function takes a full path and a base path as input and returns the remaining path. If the full path starts with the base path, it removes the base path from the full path and returns the remaining path. If the full path does not start with the base path, it returns the full path as is.
Parameters: full_path (str): The full path. base_path (str): The base path.
Returns: str: The remaining path.
- src.io_functions.get_rspace_workspace_folders(sampleParameter, elnName='rspace')[source]
This function retrieves workspace folders from RSpace inventory.
Parameters: sampleParameter (str): The sample parameter. elnName (str): The ELN name. Default is ‘rspace’.
Returns: dict: The JSON response containing the workspace folders.
- src.io_functions.get_sample_data_from_barcode(sampleParameter: str, elnName: str = 'rspace') Dict[str, any] [source]
This function retrieves sample data from a barcode using the RSpace API. It takes the sample parameter and the ELN name as input and returns a dictionary containing the sample data.
Parameters: sampleParameter (str): The sample parameter. elnName (str): The ELN name. Default is ‘rspace’.
Returns: Dict[str, any]: A dictionary containing the sample data.
- src.io_functions.get_secret_api_parameters(source: str = 'api_secrets.json', type: str = 'rspace') Dict[str, any] [source]
This function reads a JSON file containing secret API parameters and returns a dictionary of parameters for a specific type.
Parameters: source (str): The path to the JSON file containing secret API parameters. Default is ‘api_secrets.json’. type (str): The type of API parameters to return. Default is ‘rspace’.
Returns: Dict[str, any]: A dictionary of API parameters for the specified type.
- src.io_functions.get_stopwords(source: str = 'stopwords.json') List[str] [source]
This function reads a JSON file containing stopwords and returns a list of stopwords.
Parameters: source (str): The path to the JSON file containing stopwords. Default is ‘stopwords.json’.
Returns: List[str]: A list of stopwords.
- src.io_functions.get_tag_description(tag: str, descriptions: dict | None = None) str [source]
This function retrieves the description of a tag from a dictionary of descriptions. It takes a tag name and a dictionary of descriptions as input and returns the description of the tag if it is present, or an empty string otherwise.
Parameters: tag (str): The tag name. descriptions (dict): A dictionary of descriptions. Default is the output of get_description_dict().
Returns: str: The description of the tag if it is present, or an empty string otherwise.
- src.io_functions.get_tags_from_id(db_name, id, df_type='fs_storage')[source]
This function retrieves tags from the database based on the object id and source.
Parameters: db_name (str): The name of the database. id (int): The object id. df_type (str): The source type. Default is ‘fs_storage’.
Returns: df (DataFrame): A pandas DataFrame containing the tag data.
- src.io_functions.get_tags_tuple(conn: object, user: str = 'inplace') List[Tuple[str, int]] [source]
This function retrieves a list of tuples containing tag names and IDs for a specific user. It takes a connection object and a user name as input and returns a list of tuples containing tag names and IDs.
Parameters: conn (object): The connection object. user (str): The user name. Default is ‘inplace’.
Returns: List[Tuple[str, int]]: A list of tuples containing tag names and IDs.
- src.io_functions.insert_dict_to_database(db_name: str, table_name: str, data_dict: Dict[str, any]) int [source]
This function inserts a dictionary of data into a SQLite database table. It takes the database name, table name, and a dictionary of data as input. The keys of the dictionary are used as column names and the values are used as the corresponding row values. If an IntegrityError is raised, the function commits the changes and closes the connection. The function returns 1 if an IntegrityError is raised and 0 otherwise.
Parameters: db_name (str): The name of the SQLite database. table_name (str): The name of the table to insert data into. data_dict (Dict[str, any]): A dictionary of data to insert into the table.
Returns: int: 1 if an IntegrityError is raised, 0 otherwise.
- src.io_functions.insert_egroupware(verbose=0)[source]
This function inserts project registration and schedule data from egroupware into the database.
Parameters: verbose (int): Verbosity level. Default is 0.
- src.io_functions.is_string_in_list(s: str, lst: List[Tuple[str, int]]) int [source]
This function checks if a string is present in a list of tuples and returns the corresponding value. It takes a string and a list of tuples as input and returns the corresponding value if the string is present, or False otherwise.
Parameters: s (str): The string to search for. lst (List[Tuple[str, int]]): The list of tuples to search in.
Returns: int: The corresponding value if the string is present, or False otherwise.
- src.io_functions.load_json_file(file_path: str) dict [source]
This function takes a file path as input and loads the JSON file at that path. It returns the JSON data as a dictionary.
Parameters: file_path (str): The path to the JSON file.
Returns: dict: The JSON data as a dictionary.
- src.io_functions.omero_inplace_bulk_import(bulk_file_src_path, bulk_file_tar_path='~/groups/omero-import', docker_vol_path='/NETSTORE_omero-import', server='localhost', username='inplace', password='omero')[source]
This function performs an bulk inplace import of csv file list of files/folder into target datasets on an OMERO server using Docker. It takes the target dataset, source file path, Docker volume path, server address, username, and password as input.
Parameters: bulk_file_src_path (str): The source file path. bulk_file_tar_path (str): The target file path. Default is ‘~/groups/omero-import’. docker_vol_path (str): The Docker volume path. Default is ‘/NETSTORE_omero-import’. server (str): The server address. Default is ‘localhost’. username (str): The username. Default is ‘inplace’. password (str): The password. Default is ‘omero’.
- src.io_functions.omero_inplace_import(target_ds: str, src: str, docker_vol_path: str = '/NETSTORE_omero-import', server: str = 'localhost', username: str = 'inplace', password: str = 'omero') None [source]
This function performs an inplace import of a source file into a target dataset on an OMERO server using Docker. It takes the target dataset, source file path, Docker volume path, server address, username, and password as input.
Parameters: target_ds (str): The target dataset. src (str): The source file path. docker_vol_path (str): The Docker volume path. Default is ‘/NETSTORE_omero-import’. server (str): The server address. Default is ‘localhost’. username (str): The username. Default is ‘inplace’. password (str): The password. Default is ‘omero’.
- src.io_functions.process_rspace_documents(documents, db_name, table_name)[source]
Process RSpace documents and insert them into the database.
- Args:
documents (dict): A dictionary containing the RSpace documents. db_name (str): The name of the database. table_name (str): The name of the table to insert the documents into.
- src.io_functions.process_rspace_folder(folder, db_name, table_name)[source]
Process RSpace folders and insert them into the database.
- Args:
folders (dict): A dictionary containing the RSpace documents. db_name (str): The name of the database. table_name (str): The name of the table to insert the documents into.
- src.io_functions.process_tags(db_name, table_name, doc, object_document)[source]
Process the tags of an RSpace document and insert them into the database.
- Args:
db_name (str): The name of the database. table_name (str): The name of the table to insert the tags into. doc (dict): The RSpace document. object_document (dict): The object document corresponding to the RSpace document.
- src.io_functions.remove_stopwords(text: str) str [source]
This function takes a string as input and removes stopwords from it. The function returns the string with stopwords removed.
Parameters: text (str): The input string.
Returns: str: The string with stopwords removed.
- src.io_functions.save_df(file_path: str, df: DataFrame) None [source]
This function takes a file path and a DataFrame as input and saves the DataFrame to a CSV file at the specified path.
Parameters: file_path (str): The path to the file. df (pd.DataFrame): The DataFrame to save.
- src.io_functions.search_documents(doc_name, apikey='e7HV1YLT8BvhM2dSoVvzIyVK0svNexJq', base_url='https://rstest.int.lin-magdeburg.de/api/v1')[source]
This function searches for a document in RSpace.
Parameters: doc_name (str): The name of the document. apikey (str): The API key. Default is ‘e7HV1YLT8BvhM2dSoVvzIyVK0svNexJq’. base_url (str): The base URL. Default is ‘https://rstest.int.lin-magdeburg.de/api/v1’.
Returns: dict: The document information if found, otherwise 1.
- src.io_functions.search_folder(folder_name, apikey='e7HV1YLT8BvhM2dSoVvzIyVK0svNexJq', base_url='https://rstest.int.lin-magdeburg.de/api/v1')[source]
This function searches for a folder in RSpace.
Parameters: folder_name (str): The name of the folder. apikey (str): The API key. Default is ‘e7HV1YLT8BvhM2dSoVvzIyVK0svNexJq’. base_url (str): The base URL. Default is ‘https://rstest.int.lin-magdeburg.de/api/v1’.
Returns: dict: The folder information if found, otherwise 1.
src.io_metadata module
- src.io_metadata.extract_metadata(file, outputfolder, showinfPath, showinfParameter)[source]
This function extracts metadata from a file and saves it to an output folder.
Parameters: file (str): The path to the file. outputfolder (str): The path to the output folder. showinfPath (str): The path to the showinf tool. showinfParameter (str): The parameters for the showinf tool.
Returns: list: A list containing the extraction error, input file, output file, extension, and output folder.
- src.io_metadata.get_bf_metadata(fileinput, showinfPath, showinfParameter)[source]
This function extracts metadata from a file using the Bio-Formats showinf tool.
Parameters: fileinput (str): The path to the file. showinfPath (str): The path to the showinf tool. showinfParameter (str): The parameters for the showinf tool.
Returns: bytes: The metadata extracted from the file.
- src.io_metadata.get_init()[source]
This function reads the showinf parameters from a configuration file.
Returns: tuple: A tuple containing the showinf path and parameters.
- src.io_metadata.get_inputlist(folder)[source]
This function returns a list of files in a folder.
Parameters: folder (str): The path to the folder.
Returns: list: A list of files in the folder.
- src.io_metadata.is_tar_archive(file)[source]
This function checks if a file is a tar archive.
Parameters: file (str): The path to the file.
Returns: bool: True if the file is a tar archive, False otherwise.
- src.io_metadata.process_tar_gz(file_path, outputfolder, tmp=1)[source]
This function processes a tar.gz file, extracts metadata from its contents, and saves the results to an output folder.
Parameters: file_path (str): The path to the tar.gz file. outputfolder (str): The path to the output folder. tmp (int): A flag indicating whether to use a temporary directory for extraction. Default is 1.
Returns: DataFrame: A DataFrame containing the extraction results.
- src.io_metadata.save_metadata(metadata, outputfolder, filename)[source]
This function saves metadata to a file.
Parameters: metadata (str): The metadata to save. outputfolder (str): The path to the output folder. filename (str): The name of the output file.
src.llm_response module
- src.llm_response.get_base_prompt(kind='short to long')[source]
Get the base prompt for the model based on the kind of task.
Parameters: - kind (str): The type of task (‘short to long’).
Returns: - str: The base prompt for the model.
- src.llm_response.get_mistral_response(prompt, local=True)[source]
Get a response from the Mistral model based on the provided prompt.
Parameters: - prompt (str): The input prompt for the model. - local (bool): If True, use a local model. If False, use the Mistral API.
Returns: - str: The response generated by the model.