New

Database Performance Troubleshoot Notebook

Notebook

SingleStore Notebooks

Database Performance Troubleshoot Notebook

Intro

Introducing a powerful Python script designed to ease performance analysis tasks for database management.

  1. This script loads query information from csv file exposed on public URL

  2. Executes SQL queries against selected database

  3. Exports results to searchable html tables and uploads archive of generated html files with index into stage area

  4. Handles Stage Area operations using singlestore python client which uses SingleStore Management API

  5. Simplifying complex tasks, this script is essential for streamlining workflows for administrators and developers alike

What you will learn in this notebook:

  1. How to read a csv and load data into pandas dataframes[Python] Download DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv file from url

  2. Execute queries and export result into html files [Python]

  3. Use of SingleStore client for db operations and stage area [Python]

What benefits do you get out of using the notebook.

  1. User will be able to run most used performance checks

  2. Results are exported into HTML for better view

  3. Along with analysis of known scenarios, script also provides background and possible actions to take

Questions?

Reach out to us through our forum.

Pre-requisites

We will need below parameters to proceed.

  1. SingleStore Management API KEY. Follow this link for API Key

  2. Directory Path of Stage Area ( Target location to upload archive )

  3. URL to download csv file

  4. URL of result template directory

Note: You may use the

For simplicity of demo, here we are using a public accessible URL, you have to adapt access pattern to suit your needs.

CSV File structure

QueryIDQueryNameQueryTxt

Install Libraries

In [1]:

pip install openpyxl jsonpath_ng sql_metadata

Imports

In [2]:

import io
import logging
import os
import shutil
import tarfile
import time
import json
import re
import os
import csv
import pandas as pd
import singlestoredb as s2
from pathlib import Path
from urllib.request import urlopen
from jsonpath_ng import parse
from sql_metadata import Parser
from urllib.error import HTTPError
from datetime import datetime
from openpyxl import Workbook
from IPython.display import display, HTML

Variables

In [3]:

query_data_url = "https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/notebooks/performance-troubleshooting/assets/DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv"
template_url_base = "https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/notebooks/performance-troubleshooting/assets/templates/"
stage_folder_path = 'DBPERF-REPORT'
my_timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
local_output_dir_suffix = '_' + my_timestamp + '_PERF_REPORT'
empty_result_table = '<p class="mb-3 mt-3" style="text-align:center;color:blue;">No Matching Records Found</p>'
result_table_html_classes = 'table table-striped table-bordered table-responsive my-2 px-2'
WORKGROUP_ID = os.getenv('SINGLESTOREDB_WORKSPACE_GROUP')
HEADERS = ["PLAN_ID", "DATABASE_NAME", "SQL_QUERY","SUGGESTION","CMP_EXP",
"LEFT_TABLE", "LEFT_COLUMN", "LEFT_TYPE", "LEFT_TABLE_ROW_COUNT",
"RIGHT_TABLE", "RIGHT_COLUMN", "RIGHT_TYPE", "RIGHT_TABLE_ROW_COUNT" ]
s2_workgroup_stage = None
s2_workspace_name = None
row_count_parser = parse("$.*.rowcount")
table_row_count_cache = {}
MISMATCH_DATATYPE_ROWCOUNT_COMP_QUERY = """
WITH mismatched_comp
AS (SELECT plan_id,
database_name,
table_col AS mis_cmp
FROM information_schema.plancache
cross join TABLE(JSON_TO_ARRAY(
plan_info :: type_mismatched_comparisons)
) HAVING plan_warnings like '%%Comparisons between mismatched datatypes%%' ),
mismatche_cmp_2
AS (SELECT plan_id,
database_name,
mis_cmp ::$ comparison_expression AS cmp_exp,
mis_cmp ::$ left_type AS left_type,
mis_cmp ::$ right_type AS right_type
FROM mismatched_comp),
plan_tbl_row_counts
AS (SELECT plan_id,
database_name,
optimizer_notes ::$ table_row_counts AS tbl_row_counts,
query_text
FROM information_schema.plancache)
SELECT m2.plan_id as PLAN_ID,
m2.database_name as DATABASE_NAME ,
m2.cmp_exp as CMP_EXP,
m2.left_type as LEFT_TYPE,
m2.right_type as RIGHT_TYPE,
p2.tbl_row_counts as TBL_ROW_COUNTS,
p2.query_text AS SQL_QUERY
FROM mismatche_cmp_2 m2,
plan_tbl_row_counts p2
WHERE m2.plan_id = p2.plan_id
AND m2.database_name = p2.database_name ;
"""

Log Control

In [4]:

def set_logging_enabled(enabled):
"""
Set the logging level based on the enabled flag.
Parameters
----------
enabled : bool
True to enable logging, False to disable it.
"""
if enabled:
logging.getLogger().setLevel(logging.INFO)
else:
logging.getLogger().setLevel(logging.CRITICAL)
set_logging_enabled(False)

Note To enable logs

  • Modify 'set_logging_enabled(False)' to 'set_logging_enabled(True)' in code below

Functions to display various alerts

In [5]:

def show_warn(warn_msg):
"""
Display a warning message in a formatted HTML alert box.
Parameters
----------
warn_msg : str
The warning message to display.
"""
display(HTML(f'''<div class="alert alert-block alert-warning">
<b class="fa fa-solid fa-exclamation-circle"></b>
<div>
<p><b>Action Required</b></p>
<p>{warn_msg}</p>
</div>
</div>'''))
def show_error(error_msg):
"""
Display an error message in a formatted HTML alert box.
Parameters
----------
error_msg : str
The error message to display.
"""
display(HTML(f'''<div class="alert alert-block alert-danger">
<b class="fa fa-solid fa-exclamation-triangle"></b>
<div>
<p><b>Error</b></p>
<p>{error_msg}</p>
</div>
</div>'''))
def show_success(success_msg):
"""
Display a success message in a formatted HTML alert box.
Parameters
----------
success_msg : str
The success message to display.
"""
display(HTML(f'''<div class="alert alert-block alert-success">
<b class="fa fa-solid fa-check-circle"></b>
<div>
<p><b>Success</b></p>
<p>{success_msg}</p>
</div>
</div>'''))

Utility functions handling db connection and archiving

In [6]:

def execute_query(dbcon, query_txt):
"""
Execute a SQL query on the specified database connection.
Parameters
----------
dbcon : connection
The database connection object.
query_txt : str
The SQL query to execute.
Returns
-------
list
A list of rows returned by the query.
"""
try:
with dbcon.cursor() as cur:
cur.execute(query_txt)
return cur.fetchall()
except Exception as e:
logging.error(f"Failed to execute query: {e}")
raise Exception('Failed to execute query')
def make_tarfile(output_filename, source_dir):
"""
Create a tar.gz archive of a directory.
Parameters
----------
output_filename : str
The name of the output archive file.
source_dir : str
The path to the directory to archive.
Returns
-------
bool
True if the archive was created successfully, False otherwise.
"""
try:
with tarfile.open(output_filename, "w:gz") as tar:
tar.add(source_dir, arcname=os.path.basename(source_dir))
time.sleep(2)
file_stats = os.stat(output_filename)
logging.info(f'{output_filename} has size {(file_stats.st_size / (1024 * 1024))} mb')
return True
except Exception as e:
logging.error(f'Failed to create archive: {e}')
raise Exception(f'Failed to create archive: {e}')

Utility functions handling HTML generation

In [7]:

def copy_index_file(p_local_dir_path, p_templ_base_url):
"""
Copy the index file to the local directory.
Parameters
----------
p_local_dir_path : str
The local directory path.
p_templ_base_url : str
The base URL for the template.
"""
index_file = f'{p_local_dir_path}/index.html'
# Fetch the content of the index template from a URL
index_file_content = fetch_url_content(p_templ_base_url + 'index.template.html')
# Write the index file content to a local file
with open(index_file, 'w') as writer:
writer.write(str(index_file_content))
logging.info('Index Page are generated')
def generate_html_list(links):
"""
Generate an HTML ordered list from a comma-separated list of links.
Parameters
----------
links : str
A comma-separated list of links.
Returns
-------
str
The HTML formatted ordered list.
"""
if 'nan' == links:
return ''
html_list = '<ol>'
for item in links.split(','):
html_list += f'<li><a href="{item}">{item}</a></li>'
html_list += '</ol>'
return html_list
def generate_stage_link(title, stg_path, curr_file_path):
"""
Generate an HTML link to a stage area.
Parameters
----------
stg_path : str
The path to the stage area.
curr_file_path : str
The current file path.
Returns
-------
str
The HTML formatted link.
"""
url = f"https://portal.singlestore.com/organizations/{os.environ['SINGLESTOREDB_ORGANIZATION']}/workspaces/{os.environ['SINGLESTOREDB_WORKSPACE_GROUP']}#stage/{stg_path}"
return f"""<div style=\"text-align:center;margin-top:5px; margin-bottom:5px;\">
{title}
STAGE Link &nbsp;&nbsp;&nbsp;&nbsp; <a href='{url}'> {curr_file_path} </a>
</div>"""

Function loading query data in CSV

In [8]:

def fetch_url_content(url):
"""
Fetch the content of a URL.
Parameters
----------
url : str
The URL to fetch.
Returns
-------
str
The content of the URL.
"""
try:
with urlopen(url) as response:
if response.status == 200:
my_bytes = response.read()
file_content = my_bytes.decode("utf8")
return file_content
except HTTPError as e:
logging.error(f'Failed to read {url} - HTTP error code: {e.code} reason: {e.reason}')
raise Exception(f'Failed to read {url} - HTTP error code: {e.code} reason: {e.reason}')
def load_query_data(url):
"""
Load CSV data from a URL into a pandas DataFrame.
Parameters
----------
url : str
The URL of the CSV file.
Returns
-------
pandas.DataFrame
The loaded DataFrame.
"""
csv_file_content = fetch_url_content(url)
csv_df = pd.read_csv(io.StringIO(csv_file_content), sep=",",
dtype={'QueryID': int, 'QueryName': str, 'QueryTxt': str, 'QueryParams': str})
csv_df.sort_values(by=['QueryID'], inplace=True)
return csv_df

Verify Stage Path and Create if not exists

In [9]:

def verify_stage_area():
"""
Verify the existence and writability of a stage area.
Returns
-------
bool
True if the stage area is valid, False otherwise.
"""
try:
global s2_workgroup_stage, s2_workspace_name
my_workspace_mngr = s2.manage_workspaces()
workspace_group = my_workspace_mngr.get_workspace_group(WORKGROUP_ID)
s2_workspace_name = my_workspace_mngr.get_workspace(os.environ['SINGLESTOREDB_WORKSPACE']).name
stage_obj = workspace_group.stage.mkdir(stage_path=stage_folder_path, overwrite=False)
logging.info(
f'Stage Path {stage_folder_path} is ok. Is Directory: {stage_obj.is_dir()}. Is Writeable: {stage_obj.writable}')
if stage_obj.is_dir() and stage_obj.writable:
s2_workgroup_stage = workspace_group.stage
logging.info(f'stage is valid: {s2_workgroup_stage is not None}')
return True
else:
logging.error(f'As provided path is neither directory nor writable.')
return False
except Exception as stage_ex:
logging.error(f'Stage Path Verification Failed. {stage_ex}')
return False

Functions to analyze data type mismatch

In [10]:

def parse_table_row_counts(plan_id,json_data):
"""
Extract table names and their corresponding rowcounts from a JSON string.
Args:
json_data (str): The JSON string containing table names and rowcounts.
Returns:
dict: A dictionary containing table names as keys and their rowcounts as values.
"""
try:
if not json_data:
logging.warning(f'Plan id: {plan_id}, Optimizer notes does not contain row count element')
return None
# Extract rowcounts for each table
matches = row_count_parser.find(json.loads(json_data))
# Create a dictionary to store table names and rowcounts
table_rowcounts = {}
for match in matches:
# Extract the table name from the JSONPath match
table_name = match.full_path.left.fields[0]
rowcount = match.value
table_rowcounts[table_name] = rowcount
return table_rowcounts
except json.JSONDecodeError as e:
# Raise an error if JSON parsing fails
logging.error("Invalid JSON data: " + str(e))
raise ValueError("Invalid JSON data: " + str(e))
def fetch_table_row_counts(database_name, table_name):
"""
Fetch the row count for a given table from the database, using a cursor.
Args:
cursor: The database cursor object.
database_name (str): The name of the database.
table_name (str): The name of the table.
Returns:
int: The row count of the table, or -1 if the row count cannot be determined.
"""
global table_row_count_cache
if database_name:
lookup_key = f'{database_name}.{table_name}'
# Check if the row count is already cached
if lookup_key in table_row_count_cache.keys():
return table_row_count_cache[lookup_key]
else:
logging.warning(f'{lookup_key} is missing in cache, will fetch and update cache')
result = None
with s2.connect(results_type='dict').cursor() as cursor:
# Fetch the row count from the database
cursor.execute(f"select sum(rows) as total_rows from information_schema.table_statistics "
f"where database_name = '{database_name}' and table_name = '{table_name}'")
result = cursor.fetchone()
# Check if the result is None
if result is None:
logging.warning(f'RowCounts missing for database:{database_name}, table: {table_name}')
return -1
else:
# Cache the row count and return it
table_row_count_cache[lookup_key] = result['total_rows']
logging.info(f"fetched rowcount: {result['total_rows']}")
return result['total_rows']
else:
logging.warning(f"database field empty, so returning -1 for table:{table_name}")
return -1
def fetch_rowcount(json_data):
"""
Extract table names and their corresponding rowcounts from a JSON string.
Args:
json_data (str): The JSON string containing table names and rowcounts.
Returns:
dict: A dictionary containing table names as keys and their rowcounts as values.
"""
try:
# Parse the JSON string
data = json.loads(json_data)
# Define the JSONPath expression to extract rowcounts
expr = parse("$.*.rowcount")
# Extract rowcounts for each table
matches = expr.find(data)
# Create a dictionary to store table names and rowcounts
table_rowcounts = {}
for match in matches:
# Extract the table name from the JSONPath match
table_name = match.full_path.left.fields[0]
rowcount = match.value
table_rowcounts[table_name] = rowcount
return table_rowcounts
except json.JSONDecodeError as e:
# Raise an error if JSON parsing fails
logging.error("Invalid JSON data: " + str(e))
raise ValueError("Invalid JSON data: " + str(e))
def extract_table_columns(comparison_expression, table_aliases):
"""
Extract left and right table columns from a comparison expression.
Args:
comparison_expression (str): The comparison expression.
Returns:
tuple: A tuple containing the left and right table columns if they exist,
or None if only one side is present in the expression.
"""
try:
match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ `([^`]+)`\.`([^`]+)`\)", comparison_expression)
if match:
left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)
left_col = (match.group(2) if match.group(2) else "")
right_tab = resolve_table_names((match.group(3) if match.group(3) else ""), table_aliases)
right_col = (match.group(4) if match.group(2) else "")
return left_tab, left_col, right_tab, right_col
# exp like "(`mq`.`keyword` > 0)"
match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ ([\d.]+)\)", comparison_expression)
if match:
left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)
left_col = (match.group(2) if match.group(2) else "")
right_tab = ""
right_col = ""
return left_tab, left_col, right_tab, right_col
# exp like "(`mq`.`keyword` > '0')"
match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ ('.+')\)", comparison_expression)
if match:
left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)
left_col = (match.group(2) if match.group(2) else "")
right_tab = ""
right_col = ""
return left_tab, left_col, right_tab, right_col
# exp like "( 0 < `mq`.`keyword`)"
match = re.match(r"\(([\d.]+) (?:=|>|<|>=|<=)+ `([^`]+)`\.`([^`]+)`\)", comparison_expression)
if match:
left_tab = ""
left_col = ""
right_tab = resolve_table_names((match.group(2) if match.group(2) else ""), table_aliases)
right_col = (match.group(3) if match.group(3) else "")
return left_tab, left_col, right_tab, right_col
# exp like "(`mq`.`keyword` = NULL)"
match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ (.*?)\)", comparison_expression)
if match:
left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)
left_col = (match.group(2) if match.group(2) else "")
right_tab = ""
right_col = ""
return left_tab, left_col, right_tab, right_col
# exp like "(`mq`.`keyword` = 'NULL')"
match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ ('.*?')\)", comparison_expression)
if match:
left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)
left_col = (match.group(2) if match.group(2) else "")
right_tab = ""
right_col = ""
return left_tab, left_col, right_tab, right_col
# exp like ( DATE_FORMAT(`mq`.`record_date`,'%Y') = `mt`.`year`)
match = re.match(r"\( ([A-Za-z_]+)\(`([^`]+)`\.`([^`]+)`\,('.+')\) (?:=|>|<|>=|<=)+ `([^`]+)`\.`([^`]+)`\)",
comparison_expression)
if match:
left_tab = resolve_table_names((match.group(2) if match.group(2) else ""), table_aliases)
left_col = (match.group(3) if match.group(3) else "")
right_tab = resolve_table_names((match.group(5) if match.group(5) else ""), table_aliases)
right_col = (match.group(6) if match.group(6) else "")
return left_tab, left_col, right_tab, right_col
return None, None, None, None
except Exception as ce:
logging.error(f"Error extracting table columns from '{comparison_expression}': {ce}")
raise
def resolve_table_names(table_alias, table_alias_dict):
"""
Resolve the actual table name from a given table alias.
Args:
table_alias (str): The table alias to resolve.
table_alias_dict (dict): A dictionary mapping table aliases to actual table names.
Returns:
str: The resolved table name.
"""
try:
if table_alias in table_alias_dict:
return table_alias_dict[table_alias]
elif re.match(r'.*_[0-9]+$', table_alias):
return table_alias[:-2]
else:
return table_alias
except Exception as e:
logging.error(f"Error resolving table name for alias '{table_alias}': {e}")
raise
def analyze_comparison(row):
"""
Analyze a comparison expression in a row and generate a suggestion based on table row counts.
Args:
row (dict): A dictionary representing a row of data with keys 'DATABASE_NAME', 'CMP_EXP', 'LEFT_TABLE',
'RIGHT_TABLE', and 'TBL_ROW_COUNTS'.
Returns:
dict: The input row dictionary updated with 'SUGGESTION', 'LEFT_TABLE_ROW_COUNT', and 'RIGHT_TABLE_ROW_COUNT'.
"""
try:
db_name = row['DATABASE_NAME']
cmp_element = row['CMP_EXP']
left_table = row['LEFT_TABLE']
right_table = row['RIGHT_TABLE']
#rowcount_dict = fetch_rowcount(row['TBL_ROW_COUNTS'])
rowcount_dict = parse_table_row_counts(row['PLAN_ID'],row['TBL_ROW_COUNTS'])
if rowcount_dict is None:
rowcount_dict = {}
if left_table:
rowcount_dict[f'{db_name}.{left_table}'] = fetch_table_row_counts(db_name, left_table)
if right_table:
rowcount_dict[f'{db_name}.{right_table}'] = fetch_table_row_counts(db_name, right_table)
suggestion = io.StringIO()
suggestion.write(f'For Expression: {cmp_element}. ')
left_row_count = None
right_row_count = None
if left_table is not None:
left_lookup_key = (left_table if db_name in left_table else db_name + '.' + left_table)
left_row_count = rowcount_dict.get(left_lookup_key, -1)
if right_table is not None:
right_lookup_key = (right_table if db_name in right_table else db_name + '.' + right_table)
right_row_count = rowcount_dict.get(right_lookup_key, -1)
if left_row_count is not None and right_row_count is not None:
if left_row_count < right_row_count:
suggestion.write(f"{left_table} has fewer records, consider table size while optimizing.")
elif left_row_count > right_row_count:
suggestion.write(f"{right_table} has fewer records, consider table size while optimizing.")
else:
suggestion.write(f"The number of records is equal on both sides of the expression.")
else:
suggestion.write(f"Unable to determine row counts for comparison: {cmp_element}.")
row['SUGGESTION'] = suggestion.getvalue()
row['LEFT_TABLE_ROW_COUNT'] = left_row_count
row['RIGHT_TABLE_ROW_COUNT'] = right_row_count
return row
except Exception as e:
logging.error(f"Error analyzing comparison for row '{row}': {e}")
raise
def generate_datatype_mismatch_rep(conn, output_file, batch_size):
"""
Process database records fetched using a SQL query and generate a excel report.
Args:
conn: The database connection object.
output_file (str): filepath to write.
batch_size (int): database query fetch size.
Returns:
None
"""
try:
logging.info("Starting database record processing...")
lines = []
total_records_processed = 0
data_prep_start = time.perf_counter()
with conn.cursor() as cursor:
cursor.execute(MISMATCH_DATATYPE_ROWCOUNT_COMP_QUERY)
while True:
database_records = cursor.fetchmany(batch_size)
logging.info(f"fetched {len(database_records)} rows")
if not database_records:
break
for row in database_records:
try:
parser = Parser(row['SQL_QUERY'])
row['LEFT_TABLE'], row['LEFT_COLUMN'], row['RIGHT_TABLE'], row[
'RIGHT_COLUMN'] = extract_table_columns(row['CMP_EXP'], parser.tables_aliases)
row = analyze_comparison(row)
lines.append(row)
total_records_processed += 1
if total_records_processed % 1000 == 0:
logging.info(f"Processed {total_records_processed} records...")
except Exception as ex:
logging.debug(f"while processing record: {row}")
logging.error(ex)
logging.debug(f"total Processed {total_records_processed} records")
logging.info(f'Data Preparation took {(time.perf_counter() - data_prep_start):.4f} seconds')
if total_records_processed > 0 :
report_write_start = time.perf_counter()
wb = Workbook()
ws = wb.active
ws.append(list(HEADERS))
for line in lines:
ws.append(list(line.values()))
wb.save(output_file)
logging.info("Writing to report completed")
logging.info(f'Report Writing took {(time.perf_counter() - report_write_start):.4f} seconds')
return True
else:
logging.info('No records to write, skipping report')
return False
#logging.info("Database record processing completed.")
except Exception as e:
logging.error(f"An error occurred during database record processing: {e}")
raise

Process dataframe and generate reports for each query in csv

In [11]:

def process_dataframe(query_csv_dataframe):
"""
Process a DataFrame containing query data.
Parameters
----------
query_csv_dataframe : pandas.DataFrame
The DataFrame containing query data.
"""
excel_report_file = f'{local_dir_path}/perf_troubleshoot_report.xlsx'
with pd.ExcelWriter(excel_report_file, engine="openpyxl") as xlwriter:
for idx, row in query_csv_dataframe.astype(str).iterrows():
query_id = row['QueryID']
query_name = row['QueryName']
query = row['QueryTxt']
logging.debug(f'about to execute {query_name}')
xlwriter.book.create_sheet(query_name[:30])
try:
# Execute the query
result = execute_query(conn, query)
logging.info(f"Fetched query ID: {query_id} NAME: {query_name}")
# Fetch the template for the result page
template = fetch_url_content(template_url_base + 'Result-' + str(query_id) + '.template.html')
if not result:
# If the result is empty, replace the template placeholder with an empty table
final_content = template.replace('rstable', empty_result_table)
else:
# If the result is not empty, format it as an HTML table and replace the template placeholder
result_df = pd.DataFrame(result)
result_df.columns = map(str.upper, result_df.columns)
result_table_id = 'rstbl'
result_table_content = result_df.to_html(table_id=result_table_id,
index=False,
classes=result_table_html_classes)
final_content = template.replace('rstable', result_table_content)
result_df.to_excel(xlwriter, sheet_name=query_name, index=False)
# Write the final content to an HTML file
report_file = f'{local_dir_path}/{query_id}.html'
with open(report_file, 'w') as writer:
writer.write(final_content)
except Exception as curr_iter_err:
# If an exception occurs during query execution, log the error and show a warning message
logging.error(f"Error executing query ID: {query_id}, NAME: {query_name}: {curr_iter_err}")
logging.exception("Exception details")
show_warn(f"Error executing query ID: {query_id}, NAME: {query_name}")
logging.info(f'process completed for ID:{query_id} Name:{query_name}')
logging.info('Result Pages are generated')
logging.info(f'Excel Report perf_troubleshoot_report.xlsx is generated')

Function to clean up generated directories

In [12]:

def clean_dir(p_dir_path, p_archive_file_path):
"""
Clean the local directory by removing all HTML files and the archive file.
Parameters
----------
p_dir_path : str
The path to the local directory.
p_archive_file_path : str
The path to the archive file.
"""
# Remove the archive file
try:
os.remove(p_archive_file_path)
logging.info('Local archive file removed')
logging.info('about to clean previous generated files in local dir')
shutil.rmtree(p_dir_path)
except OSError as e:
logging.error('Clean up failed')
execution_success = False
error_msg = 'clean up failed'
print("Error: %s : %s" % (dir_path, e.strerror))
raise Exception(f'Failed to clean up {str(e)}')

In [13]:

if connection_url.endswith('/'):
show_warn('Database not selected. Please select from dropdown in top of web page')
else:
execution_success = True
final_file_path = None
error_msg = None
try:
if verify_stage_area():
# Establish a database connection, use dict as results_type
conn = s2.connect(results_type='dict')
logging.info('Database Connection establised')
# Load query data from a csv file into a pandas DataFrame.
queries_df = load_query_data(url=query_data_url)
logging.info('Query Data loaded')
# Create a local directory for storing result files
local_dir_path = (s2_workspace_name + local_output_dir_suffix)
path = Path(local_dir_path)
path.mkdir(exist_ok=True)
process_dataframe(queries_df)
mismatch_report_file_path = f"{local_dir_path}/datatype-mismatch-comparision-report.xlsx"
generate_datatype_mismatch_rep(conn, output_file=mismatch_report_file_path, batch_size=1000)
copy_index_file(local_dir_path, template_url_base)
# Create a zip archive of the result files
final_file_path = s2_workspace_name + '_PERF_REPORT_' + my_timestamp + '.tar.gz'
zip_success = make_tarfile(final_file_path, local_dir_path)
logging.info('archive created')
# Upload the zip archive to the stage area
if zip_success:
try:
uploaded_obj = s2_workgroup_stage.upload_file(local_path=final_file_path,
stage_path=f'{stage_folder_path}/{final_file_path}')
logging.info(f'Upload success. Path: {uploaded_obj.abspath()} ')
print(f'File uploaded to STAGE AREA: {uploaded_obj.abspath()}')
logging.info('Upload success')
except Exception as e:
# If an exception occurs during the upload process, log the error
execution_success = False
logging.error(f'Failed during upload process{e}')
error_msg = 'File Upload failed'
clean_dir(local_dir_path, final_file_path)
logging.info('Local files cleaned')
else:
# If creating the zip archive fails, set execution_success to False and log the error
logging.error('Failed to create archive')
execution_success = False
error_msg = 'Failed to create archive'
else:
# If verifying the stage area fails, set execution_success to False and log the error
logging.info("Stage Area Verification Failed. Exiting.")
print('Script execution Failed')
execution_success = False
error_msg = 'Failed to create missing stage area path or it is not writeable'
except Exception as e:
execution_success = False
logging.error(f"An error occurred: {e}")
logging.exception("Exception details")
error_msg = f'Exception occured. {str(e)}'
# Display a success or error message based on the execution success
if execution_success:
#show_success("Files are uploaded to Stage")
show_success(generate_stage_link('Upload to stage success, File: ', stage_folder_path, final_file_path))
else:
show_error(error_msg)
logging.info(f'Script execution completed sucessfully: {execution_success}')

Important NOTE

  • Actions suggested suit most of performance improvement scenarios, Still we would encourage to test and verify before applying on prod environemnts

  • To use notebook as scheduled one, we have to modify python code to refer configuration from table instead of user input

Details

Tags

#troubleshoot#performance#tuning#starter

License

This Notebook has been released under the Apache 2.0 open source license.