Database Performance Troubleshoot Notebook
Notebook
Intro
Introducing a powerful Python script designed to ease performance analysis tasks for database management.
This script loads query information from csv file exposed on public URL
Executes SQL queries against selected database
Exports results to searchable html tables and uploads archive of generated html files with index into stage area
Handles Stage Area operations using singlestore python client which uses SingleStore Management API
Simplifying complex tasks, this script is essential for streamlining workflows for administrators and developers alike
What you will learn in this notebook:
How to read a csv and load data into pandas dataframes[Python] Download DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv file from url
Execute queries and export result into html files [Python]
Use of SingleStore client for db operations and stage area [Python]
What benefits do you get out of using the notebook.
User will be able to run most used performance checks
Results are exported into HTML for better view
Along with analysis of known scenarios, script also provides background and possible actions to take
Questions?
Reach out to us through our forum.
Pre-requisites
We will need below parameters to proceed.
SingleStore Management API KEY. Follow this link for API Key
Directory Path of Stage Area ( Target location to upload archive )
URL to download csv file
URL of result template directory
Note: You may use the
DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv as template to add up your queries.
templates as templates for results
For simplicity of demo, here we are using a public accessible URL, you have to adapt access pattern to suit your needs.
CSV File structure
QueryID | QueryName | QueryTxt |
Install Libraries
In [1]:
pip install openpyxl jsonpath_ng sql_metadata
Imports
In [2]:
import ioimport loggingimport osimport shutilimport tarfileimport timeimport jsonimport reimport osimport csvimport pandas as pdimport singlestoredb as s2from pathlib import Pathfrom urllib.request import urlopenfrom jsonpath_ng import parsefrom sql_metadata import Parserfrom urllib.error import HTTPErrorfrom datetime import datetimefrom openpyxl import Workbookfrom IPython.display import display, HTML
Variables
In [3]:
query_data_url = "https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/notebooks/performance-troubleshooting/assets/DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv"template_url_base = "https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/notebooks/performance-troubleshooting/assets/templates/"stage_folder_path = 'DBPERF-REPORT'my_timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")local_output_dir_suffix = '_' + my_timestamp + '_PERF_REPORT'empty_result_table = '<p class="mb-3 mt-3" style="text-align:center;color:blue;">No Matching Records Found</p>'result_table_html_classes = 'table table-striped table-bordered table-responsive my-2 px-2'WORKGROUP_ID = os.getenv('SINGLESTOREDB_WORKSPACE_GROUP')HEADERS = ["PLAN_ID", "DATABASE_NAME", "SQL_QUERY","SUGGESTION","CMP_EXP","LEFT_TABLE", "LEFT_COLUMN", "LEFT_TYPE", "LEFT_TABLE_ROW_COUNT","RIGHT_TABLE", "RIGHT_COLUMN", "RIGHT_TYPE", "RIGHT_TABLE_ROW_COUNT" ]s2_workgroup_stage = Nones2_workspace_name = Nonerow_count_parser = parse("$.*.rowcount")table_row_count_cache = {}MISMATCH_DATATYPE_ROWCOUNT_COMP_QUERY = """WITH mismatched_compAS (SELECT plan_id,database_name,table_col AS mis_cmpFROM information_schema.plancachecross join TABLE(JSON_TO_ARRAY(plan_info :: type_mismatched_comparisons)) HAVING plan_warnings like '%%Comparisons between mismatched datatypes%%' ),mismatche_cmp_2AS (SELECT plan_id,database_name,mis_cmp ::$ comparison_expression AS cmp_exp,mis_cmp ::$ left_type AS left_type,mis_cmp ::$ right_type AS right_typeFROM mismatched_comp),plan_tbl_row_countsAS (SELECT plan_id,database_name,optimizer_notes ::$ table_row_counts AS tbl_row_counts,query_textFROM information_schema.plancache)SELECT m2.plan_id as PLAN_ID,m2.database_name as DATABASE_NAME ,m2.cmp_exp as CMP_EXP,m2.left_type as LEFT_TYPE,m2.right_type as RIGHT_TYPE,p2.tbl_row_counts as TBL_ROW_COUNTS,p2.query_text AS SQL_QUERYFROM mismatche_cmp_2 m2,plan_tbl_row_counts p2WHERE m2.plan_id = p2.plan_idAND m2.database_name = p2.database_name ;"""
Log Control
In [4]:
def set_logging_enabled(enabled):"""Set the logging level based on the enabled flag.Parameters----------enabled : boolTrue to enable logging, False to disable it."""if enabled:logging.getLogger().setLevel(logging.INFO)else:logging.getLogger().setLevel(logging.CRITICAL)set_logging_enabled(False)
Note To enable logs
Modify 'set_logging_enabled(False)' to 'set_logging_enabled(True)' in code below
Functions to display various alerts
In [5]:
def show_warn(warn_msg):"""Display a warning message in a formatted HTML alert box.Parameters----------warn_msg : strThe warning message to display."""display(HTML(f'''<div class="alert alert-block alert-warning"><b class="fa fa-solid fa-exclamation-circle"></b><div><p><b>Action Required</b></p><p>{warn_msg}</p></div></div>'''))def show_error(error_msg):"""Display an error message in a formatted HTML alert box.Parameters----------error_msg : strThe error message to display."""display(HTML(f'''<div class="alert alert-block alert-danger"><b class="fa fa-solid fa-exclamation-triangle"></b><div><p><b>Error</b></p><p>{error_msg}</p></div></div>'''))def show_success(success_msg):"""Display a success message in a formatted HTML alert box.Parameters----------success_msg : strThe success message to display."""display(HTML(f'''<div class="alert alert-block alert-success"><b class="fa fa-solid fa-check-circle"></b><div><p><b>Success</b></p><p>{success_msg}</p></div></div>'''))
Utility functions handling db connection and archiving
In [6]:
def execute_query(dbcon, query_txt):"""Execute a SQL query on the specified database connection.Parameters----------dbcon : connectionThe database connection object.query_txt : strThe SQL query to execute.Returns-------listA list of rows returned by the query."""try:with dbcon.cursor() as cur:cur.execute(query_txt)return cur.fetchall()except Exception as e:logging.error(f"Failed to execute query: {e}")raise Exception('Failed to execute query')def make_tarfile(output_filename, source_dir):"""Create a tar.gz archive of a directory.Parameters----------output_filename : strThe name of the output archive file.source_dir : strThe path to the directory to archive.Returns-------boolTrue if the archive was created successfully, False otherwise."""try:with tarfile.open(output_filename, "w:gz") as tar:tar.add(source_dir, arcname=os.path.basename(source_dir))time.sleep(2)file_stats = os.stat(output_filename)logging.info(f'{output_filename} has size {(file_stats.st_size / (1024 * 1024))} mb')return Trueexcept Exception as e:logging.error(f'Failed to create archive: {e}')raise Exception(f'Failed to create archive: {e}')
Utility functions handling HTML generation
In [7]:
def copy_index_file(p_local_dir_path, p_templ_base_url):"""Copy the index file to the local directory.Parameters----------p_local_dir_path : strThe local directory path.p_templ_base_url : strThe base URL for the template."""index_file = f'{p_local_dir_path}/index.html'# Fetch the content of the index template from a URLindex_file_content = fetch_url_content(p_templ_base_url + 'index.template.html')# Write the index file content to a local filewith open(index_file, 'w') as writer:writer.write(str(index_file_content))logging.info('Index Page are generated')def generate_html_list(links):"""Generate an HTML ordered list from a comma-separated list of links.Parameters----------links : strA comma-separated list of links.Returns-------strThe HTML formatted ordered list."""if 'nan' == links:return ''html_list = '<ol>'for item in links.split(','):html_list += f'<li><a href="{item}">{item}</a></li>'html_list += '</ol>'return html_listdef generate_stage_link(title, stg_path, curr_file_path):"""Generate an HTML link to a stage area.Parameters----------stg_path : strThe path to the stage area.curr_file_path : strThe current file path.Returns-------strThe HTML formatted link."""url = f"https://portal.singlestore.com/organizations/{os.environ['SINGLESTOREDB_ORGANIZATION']}/workspaces/{os.environ['SINGLESTOREDB_WORKSPACE_GROUP']}#stage/{stg_path}"return f"""<div style=\"text-align:center;margin-top:5px; margin-bottom:5px;\">{title}STAGE Link <a href='{url}'> {curr_file_path} </a></div>"""
Function loading query data in CSV
In [8]:
def fetch_url_content(url):"""Fetch the content of a URL.Parameters----------url : strThe URL to fetch.Returns-------strThe content of the URL."""try:with urlopen(url) as response:if response.status == 200:my_bytes = response.read()file_content = my_bytes.decode("utf8")return file_contentexcept HTTPError as e:logging.error(f'Failed to read {url} - HTTP error code: {e.code} reason: {e.reason}')raise Exception(f'Failed to read {url} - HTTP error code: {e.code} reason: {e.reason}')def load_query_data(url):"""Load CSV data from a URL into a pandas DataFrame.Parameters----------url : strThe URL of the CSV file.Returns-------pandas.DataFrameThe loaded DataFrame."""csv_file_content = fetch_url_content(url)csv_df = pd.read_csv(io.StringIO(csv_file_content), sep=",",dtype={'QueryID': int, 'QueryName': str, 'QueryTxt': str, 'QueryParams': str})csv_df.sort_values(by=['QueryID'], inplace=True)return csv_df
Verify Stage Path and Create if not exists
In [9]:
def verify_stage_area():"""Verify the existence and writability of a stage area.Returns-------boolTrue if the stage area is valid, False otherwise."""try:global s2_workgroup_stage, s2_workspace_namemy_workspace_mngr = s2.manage_workspaces()workspace_group = my_workspace_mngr.get_workspace_group(WORKGROUP_ID)s2_workspace_name = my_workspace_mngr.get_workspace(os.environ['SINGLESTOREDB_WORKSPACE']).namestage_obj = workspace_group.stage.mkdir(stage_path=stage_folder_path, overwrite=False)logging.info(f'Stage Path {stage_folder_path} is ok. Is Directory: {stage_obj.is_dir()}. Is Writeable: {stage_obj.writable}')if stage_obj.is_dir() and stage_obj.writable:s2_workgroup_stage = workspace_group.stagelogging.info(f'stage is valid: {s2_workgroup_stage is not None}')return Trueelse:logging.error(f'As provided path is neither directory nor writable.')return Falseexcept Exception as stage_ex:logging.error(f'Stage Path Verification Failed. {stage_ex}')return False
Functions to analyze data type mismatch
In [10]:
def parse_table_row_counts(plan_id,json_data):"""Extract table names and their corresponding rowcounts from a JSON string.Args:json_data (str): The JSON string containing table names and rowcounts.Returns:dict: A dictionary containing table names as keys and their rowcounts as values."""try:if not json_data:logging.warning(f'Plan id: {plan_id}, Optimizer notes does not contain row count element')return None# Extract rowcounts for each tablematches = row_count_parser.find(json.loads(json_data))# Create a dictionary to store table names and rowcountstable_rowcounts = {}for match in matches:# Extract the table name from the JSONPath matchtable_name = match.full_path.left.fields[0]rowcount = match.valuetable_rowcounts[table_name] = rowcountreturn table_rowcountsexcept json.JSONDecodeError as e:# Raise an error if JSON parsing failslogging.error("Invalid JSON data: " + str(e))raise ValueError("Invalid JSON data: " + str(e))def fetch_table_row_counts(database_name, table_name):"""Fetch the row count for a given table from the database, using a cursor.Args:cursor: The database cursor object.database_name (str): The name of the database.table_name (str): The name of the table.Returns:int: The row count of the table, or -1 if the row count cannot be determined."""global table_row_count_cacheif database_name:lookup_key = f'{database_name}.{table_name}'# Check if the row count is already cachedif lookup_key in table_row_count_cache.keys():return table_row_count_cache[lookup_key]else:logging.warning(f'{lookup_key} is missing in cache, will fetch and update cache')result = Nonewith s2.connect(results_type='dict').cursor() as cursor:# Fetch the row count from the databasecursor.execute(f"select sum(rows) as total_rows from information_schema.table_statistics "f"where database_name = '{database_name}' and table_name = '{table_name}'")result = cursor.fetchone()# Check if the result is Noneif result is None:logging.warning(f'RowCounts missing for database:{database_name}, table: {table_name}')return -1else:# Cache the row count and return ittable_row_count_cache[lookup_key] = result['total_rows']logging.info(f"fetched rowcount: {result['total_rows']}")return result['total_rows']else:logging.warning(f"database field empty, so returning -1 for table:{table_name}")return -1def fetch_rowcount(json_data):"""Extract table names and their corresponding rowcounts from a JSON string.Args:json_data (str): The JSON string containing table names and rowcounts.Returns:dict: A dictionary containing table names as keys and their rowcounts as values."""try:# Parse the JSON stringdata = json.loads(json_data)# Define the JSONPath expression to extract rowcountsexpr = parse("$.*.rowcount")# Extract rowcounts for each tablematches = expr.find(data)# Create a dictionary to store table names and rowcountstable_rowcounts = {}for match in matches:# Extract the table name from the JSONPath matchtable_name = match.full_path.left.fields[0]rowcount = match.valuetable_rowcounts[table_name] = rowcountreturn table_rowcountsexcept json.JSONDecodeError as e:# Raise an error if JSON parsing failslogging.error("Invalid JSON data: " + str(e))raise ValueError("Invalid JSON data: " + str(e))def extract_table_columns(comparison_expression, table_aliases):"""Extract left and right table columns from a comparison expression.Args:comparison_expression (str): The comparison expression.Returns:tuple: A tuple containing the left and right table columns if they exist,or None if only one side is present in the expression."""try:match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ `([^`]+)`\.`([^`]+)`\)", comparison_expression)if match:left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)left_col = (match.group(2) if match.group(2) else "")right_tab = resolve_table_names((match.group(3) if match.group(3) else ""), table_aliases)right_col = (match.group(4) if match.group(2) else "")return left_tab, left_col, right_tab, right_col# exp like "(`mq`.`keyword` > 0)"match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ ([\d.]+)\)", comparison_expression)if match:left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)left_col = (match.group(2) if match.group(2) else "")right_tab = ""right_col = ""return left_tab, left_col, right_tab, right_col# exp like "(`mq`.`keyword` > '0')"match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ ('.+')\)", comparison_expression)if match:left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)left_col = (match.group(2) if match.group(2) else "")right_tab = ""right_col = ""return left_tab, left_col, right_tab, right_col# exp like "( 0 < `mq`.`keyword`)"match = re.match(r"\(([\d.]+) (?:=|>|<|>=|<=)+ `([^`]+)`\.`([^`]+)`\)", comparison_expression)if match:left_tab = ""left_col = ""right_tab = resolve_table_names((match.group(2) if match.group(2) else ""), table_aliases)right_col = (match.group(3) if match.group(3) else "")return left_tab, left_col, right_tab, right_col# exp like "(`mq`.`keyword` = NULL)"match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ (.*?)\)", comparison_expression)if match:left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)left_col = (match.group(2) if match.group(2) else "")right_tab = ""right_col = ""return left_tab, left_col, right_tab, right_col# exp like "(`mq`.`keyword` = 'NULL')"match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ ('.*?')\)", comparison_expression)if match:left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)left_col = (match.group(2) if match.group(2) else "")right_tab = ""right_col = ""return left_tab, left_col, right_tab, right_col# exp like ( DATE_FORMAT(`mq`.`record_date`,'%Y') = `mt`.`year`)match = re.match(r"\( ([A-Za-z_]+)\(`([^`]+)`\.`([^`]+)`\,('.+')\) (?:=|>|<|>=|<=)+ `([^`]+)`\.`([^`]+)`\)",comparison_expression)if match:left_tab = resolve_table_names((match.group(2) if match.group(2) else ""), table_aliases)left_col = (match.group(3) if match.group(3) else "")right_tab = resolve_table_names((match.group(5) if match.group(5) else ""), table_aliases)right_col = (match.group(6) if match.group(6) else "")return left_tab, left_col, right_tab, right_colreturn None, None, None, Noneexcept Exception as ce:logging.error(f"Error extracting table columns from '{comparison_expression}': {ce}")raisedef resolve_table_names(table_alias, table_alias_dict):"""Resolve the actual table name from a given table alias.Args:table_alias (str): The table alias to resolve.table_alias_dict (dict): A dictionary mapping table aliases to actual table names.Returns:str: The resolved table name."""try:if table_alias in table_alias_dict:return table_alias_dict[table_alias]elif re.match(r'.*_[0-9]+$', table_alias):return table_alias[:-2]else:return table_aliasexcept Exception as e:logging.error(f"Error resolving table name for alias '{table_alias}': {e}")raisedef analyze_comparison(row):"""Analyze a comparison expression in a row and generate a suggestion based on table row counts.Args:row (dict): A dictionary representing a row of data with keys 'DATABASE_NAME', 'CMP_EXP', 'LEFT_TABLE','RIGHT_TABLE', and 'TBL_ROW_COUNTS'.Returns:dict: The input row dictionary updated with 'SUGGESTION', 'LEFT_TABLE_ROW_COUNT', and 'RIGHT_TABLE_ROW_COUNT'."""try:db_name = row['DATABASE_NAME']cmp_element = row['CMP_EXP']left_table = row['LEFT_TABLE']right_table = row['RIGHT_TABLE']#rowcount_dict = fetch_rowcount(row['TBL_ROW_COUNTS'])rowcount_dict = parse_table_row_counts(row['PLAN_ID'],row['TBL_ROW_COUNTS'])if rowcount_dict is None:rowcount_dict = {}if left_table:rowcount_dict[f'{db_name}.{left_table}'] = fetch_table_row_counts(db_name, left_table)if right_table:rowcount_dict[f'{db_name}.{right_table}'] = fetch_table_row_counts(db_name, right_table)suggestion = io.StringIO()suggestion.write(f'For Expression: {cmp_element}. ')left_row_count = Noneright_row_count = Noneif left_table is not None:left_lookup_key = (left_table if db_name in left_table else db_name + '.' + left_table)left_row_count = rowcount_dict.get(left_lookup_key, -1)if right_table is not None:right_lookup_key = (right_table if db_name in right_table else db_name + '.' + right_table)right_row_count = rowcount_dict.get(right_lookup_key, -1)if left_row_count is not None and right_row_count is not None:if left_row_count < right_row_count:suggestion.write(f"{left_table} has fewer records, consider table size while optimizing.")elif left_row_count > right_row_count:suggestion.write(f"{right_table} has fewer records, consider table size while optimizing.")else:suggestion.write(f"The number of records is equal on both sides of the expression.")else:suggestion.write(f"Unable to determine row counts for comparison: {cmp_element}.")row['SUGGESTION'] = suggestion.getvalue()row['LEFT_TABLE_ROW_COUNT'] = left_row_countrow['RIGHT_TABLE_ROW_COUNT'] = right_row_countreturn rowexcept Exception as e:logging.error(f"Error analyzing comparison for row '{row}': {e}")raisedef generate_datatype_mismatch_rep(conn, output_file, batch_size):"""Process database records fetched using a SQL query and generate a excel report.Args:conn: The database connection object.output_file (str): filepath to write.batch_size (int): database query fetch size.Returns:None"""try:logging.info("Starting database record processing...")lines = []total_records_processed = 0data_prep_start = time.perf_counter()with conn.cursor() as cursor:cursor.execute(MISMATCH_DATATYPE_ROWCOUNT_COMP_QUERY)while True:database_records = cursor.fetchmany(batch_size)logging.info(f"fetched {len(database_records)} rows")if not database_records:breakfor row in database_records:try:parser = Parser(row['SQL_QUERY'])row['LEFT_TABLE'], row['LEFT_COLUMN'], row['RIGHT_TABLE'], row['RIGHT_COLUMN'] = extract_table_columns(row['CMP_EXP'], parser.tables_aliases)row = analyze_comparison(row)lines.append(row)total_records_processed += 1if total_records_processed % 1000 == 0:logging.info(f"Processed {total_records_processed} records...")except Exception as ex:logging.debug(f"while processing record: {row}")logging.error(ex)logging.debug(f"total Processed {total_records_processed} records")logging.info(f'Data Preparation took {(time.perf_counter() - data_prep_start):.4f} seconds')if total_records_processed > 0 :report_write_start = time.perf_counter()wb = Workbook()ws = wb.activews.append(list(HEADERS))for line in lines:ws.append(list(line.values()))wb.save(output_file)logging.info("Writing to report completed")logging.info(f'Report Writing took {(time.perf_counter() - report_write_start):.4f} seconds')return Trueelse:logging.info('No records to write, skipping report')return False#logging.info("Database record processing completed.")except Exception as e:logging.error(f"An error occurred during database record processing: {e}")raise
Process dataframe and generate reports for each query in csv
In [11]:
def process_dataframe(query_csv_dataframe):"""Process a DataFrame containing query data.Parameters----------query_csv_dataframe : pandas.DataFrameThe DataFrame containing query data."""excel_report_file = f'{local_dir_path}/perf_troubleshoot_report.xlsx'with pd.ExcelWriter(excel_report_file, engine="openpyxl") as xlwriter:for idx, row in query_csv_dataframe.astype(str).iterrows():query_id = row['QueryID']query_name = row['QueryName']query = row['QueryTxt']logging.debug(f'about to execute {query_name}')xlwriter.book.create_sheet(query_name[:30])try:# Execute the queryresult = execute_query(conn, query)logging.info(f"Fetched query ID: {query_id} NAME: {query_name}")# Fetch the template for the result pagetemplate = fetch_url_content(template_url_base + 'Result-' + str(query_id) + '.template.html')if not result:# If the result is empty, replace the template placeholder with an empty tablefinal_content = template.replace('rstable', empty_result_table)else:# If the result is not empty, format it as an HTML table and replace the template placeholderresult_df = pd.DataFrame(result)result_df.columns = map(str.upper, result_df.columns)result_table_id = 'rstbl'result_table_content = result_df.to_html(table_id=result_table_id,index=False,classes=result_table_html_classes)final_content = template.replace('rstable', result_table_content)result_df.to_excel(xlwriter, sheet_name=query_name, index=False)# Write the final content to an HTML filereport_file = f'{local_dir_path}/{query_id}.html'with open(report_file, 'w') as writer:writer.write(final_content)except Exception as curr_iter_err:# If an exception occurs during query execution, log the error and show a warning messagelogging.error(f"Error executing query ID: {query_id}, NAME: {query_name}: {curr_iter_err}")logging.exception("Exception details")show_warn(f"Error executing query ID: {query_id}, NAME: {query_name}")logging.info(f'process completed for ID:{query_id} Name:{query_name}')logging.info('Result Pages are generated')logging.info(f'Excel Report perf_troubleshoot_report.xlsx is generated')
Function to clean up generated directories
In [12]:
def clean_dir(p_dir_path, p_archive_file_path):"""Clean the local directory by removing all HTML files and the archive file.Parameters----------p_dir_path : strThe path to the local directory.p_archive_file_path : strThe path to the archive file."""# Remove the archive filetry:os.remove(p_archive_file_path)logging.info('Local archive file removed')logging.info('about to clean previous generated files in local dir')shutil.rmtree(p_dir_path)except OSError as e:logging.error('Clean up failed')execution_success = Falseerror_msg = 'clean up failed'print("Error: %s : %s" % (dir_path, e.strerror))raise Exception(f'Failed to clean up {str(e)}')
In [13]:
if connection_url.endswith('/'):show_warn('Database not selected. Please select from dropdown in top of web page')else:execution_success = Truefinal_file_path = Noneerror_msg = Nonetry:if verify_stage_area():# Establish a database connection, use dict as results_typeconn = s2.connect(results_type='dict')logging.info('Database Connection establised')# Load query data from a csv file into a pandas DataFrame.queries_df = load_query_data(url=query_data_url)logging.info('Query Data loaded')# Create a local directory for storing result fileslocal_dir_path = (s2_workspace_name + local_output_dir_suffix)path = Path(local_dir_path)path.mkdir(exist_ok=True)process_dataframe(queries_df)mismatch_report_file_path = f"{local_dir_path}/datatype-mismatch-comparision-report.xlsx"generate_datatype_mismatch_rep(conn, output_file=mismatch_report_file_path, batch_size=1000)copy_index_file(local_dir_path, template_url_base)# Create a zip archive of the result filesfinal_file_path = s2_workspace_name + '_PERF_REPORT_' + my_timestamp + '.tar.gz'zip_success = make_tarfile(final_file_path, local_dir_path)logging.info('archive created')# Upload the zip archive to the stage areaif zip_success:try:uploaded_obj = s2_workgroup_stage.upload_file(local_path=final_file_path,stage_path=f'{stage_folder_path}/{final_file_path}')logging.info(f'Upload success. Path: {uploaded_obj.abspath()} ')print(f'File uploaded to STAGE AREA: {uploaded_obj.abspath()}')logging.info('Upload success')except Exception as e:# If an exception occurs during the upload process, log the errorexecution_success = Falselogging.error(f'Failed during upload process{e}')error_msg = 'File Upload failed'clean_dir(local_dir_path, final_file_path)logging.info('Local files cleaned')else:# If creating the zip archive fails, set execution_success to False and log the errorlogging.error('Failed to create archive')execution_success = Falseerror_msg = 'Failed to create archive'else:# If verifying the stage area fails, set execution_success to False and log the errorlogging.info("Stage Area Verification Failed. Exiting.")print('Script execution Failed')execution_success = Falseerror_msg = 'Failed to create missing stage area path or it is not writeable'except Exception as e:execution_success = Falselogging.error(f"An error occurred: {e}")logging.exception("Exception details")error_msg = f'Exception occured. {str(e)}'# Display a success or error message based on the execution successif execution_success:#show_success("Files are uploaded to Stage")show_success(generate_stage_link('Upload to stage success, File: ', stage_folder_path, final_file_path))else:show_error(error_msg)logging.info(f'Script execution completed sucessfully: {execution_success}')
Important NOTE
Actions suggested suit most of performance improvement scenarios, Still we would encourage to test and verify before applying on prod environemnts
To use notebook as scheduled one, we have to modify python code to refer configuration from table instead of user input
Details
About this Template
This notebook will help you perform database performance troubleshoot
This Notebook can be run in Standard and Enterprise deployments.
Tags
License
This Notebook has been released under the Apache 2.0 open source license.