New

Database Performance Troubleshoot Notebook

Notebook

SingleStore Notebooks

Database Performance Troubleshoot Notebook

Intro

Introducing a powerful Python script designed to ease performance analysis tasks for database management.

This script loads query information from csv file exposed on public URL
Executes SQL queries against selected database
Exports results to searchable html tables and uploads archive of generated html files with index into stage area
Handles Stage Area operations using singlestore python client which uses SingleStore Management API
Simplifying complex tasks, this script is essential for streamlining workflows for administrators and developers alike

What you will learn in this notebook:

How to read a csv and load data into pandas dataframes[Python] Download DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv file from url
Execute queries and export result into html files [Python]
Use of SingleStore client for db operations and stage area [Python]

What benefits do you get out of using the notebook.

User will be able to run most used performance checks
Results are exported into HTML for better view
Along with analysis of known scenarios, script also provides background and possible actions to take

Questions?

Reach out to us through our forum.

Pre-requisites

We will need below parameters to proceed.

SingleStore Management API KEY. Follow this link for API Key
Directory Path of Stage Area ( Target location to upload archive )
URL to download csv file
URL of result template directory

Note: You may use the

DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv as template to add up your queries.
templates as templates for results

For simplicity of demo, here we are using a public accessible URL, you have to adapt access pattern to suit your needs.

CSV File structure

	QueryID	QueryName	QueryTxt

Install Libraries

In [1]:

pip install openpyxl jsonpath_ng sql_metadata

Imports

In [2]:

1
import io
2
import logging
3
import os
4
import shutil
5
import tarfile
6
import time
7
import json
8
import re
9
import os
10
import csv
11

12
import pandas as pd
13
import singlestoredb as s2
14

15
from pathlib import Path
16
from urllib.request import urlopen
17
from jsonpath_ng import parse
18
from sql_metadata import Parser
19
from urllib.error import HTTPError
20
from datetime import datetime
21
from openpyxl import Workbook
22

23
from IPython.display import display, HTML

Variables

In [3]:

1
query_data_url = "https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/notebooks/performance-troubleshooting/assets/DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv"
2
template_url_base = "https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/notebooks/performance-troubleshooting/assets/templates/"
3

4
stage_folder_path = 'DBPERF-REPORT'
5

6
my_timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
7
local_output_dir_suffix = '_' + my_timestamp + '_PERF_REPORT'
8

9
empty_result_table = '<p class="mb-3 mt-3" style="text-align:center;color:blue;">No Matching Records Found</p>'
10
result_table_html_classes = 'table table-striped table-bordered table-responsive my-2 px-2'
11

12
WORKGROUP_ID = os.getenv('SINGLESTOREDB_WORKSPACE_GROUP')
13

14
HEADERS = ["PLAN_ID", "DATABASE_NAME", "SQL_QUERY","SUGGESTION","CMP_EXP",
15
           "LEFT_TABLE", "LEFT_COLUMN", "LEFT_TYPE", "LEFT_TABLE_ROW_COUNT",
16
           "RIGHT_TABLE", "RIGHT_COLUMN", "RIGHT_TYPE", "RIGHT_TABLE_ROW_COUNT" ]
17

18
s2_workgroup_stage = None
19
s2_workspace_name = None
20

21
row_count_parser = parse("$.*.rowcount")
22
table_row_count_cache = {}
23

24
MISMATCH_DATATYPE_ROWCOUNT_COMP_QUERY = """
25
WITH mismatched_comp
26
     AS (SELECT plan_id,
27
                database_name,
28
                table_col AS mis_cmp
29
         FROM   information_schema.plancache
30
                cross join TABLE(JSON_TO_ARRAY(
31
                                 plan_info :: type_mismatched_comparisons)
32
                           ) HAVING plan_warnings like '%%Comparisons between mismatched datatypes%%' ),
33
     mismatche_cmp_2
34
     AS (SELECT plan_id,
35
                database_name,
36
                mis_cmp ::$ comparison_expression AS cmp_exp,
37
                mis_cmp ::$ left_type             AS left_type,
38
                mis_cmp ::$ right_type            AS right_type
39
         FROM   mismatched_comp),
40
     plan_tbl_row_counts
41
     AS (SELECT plan_id,
42
                database_name,
43
                optimizer_notes ::$ table_row_counts AS tbl_row_counts,
44
                query_text
45
         FROM   information_schema.plancache)
46
SELECT m2.plan_id as PLAN_ID,
47
       m2.database_name as DATABASE_NAME ,
48
       m2.cmp_exp as CMP_EXP,
49
       m2.left_type as LEFT_TYPE,
50
       m2.right_type as RIGHT_TYPE,
51
       p2.tbl_row_counts as TBL_ROW_COUNTS,
52
       p2.query_text AS SQL_QUERY
53
FROM   mismatche_cmp_2 m2,
54
       plan_tbl_row_counts p2
55
WHERE  m2.plan_id = p2.plan_id
56
       AND m2.database_name = p2.database_name ;
57
"""

Log Control

In [4]:

1
def set_logging_enabled(enabled):
2
    """
3
    Set the logging level based on the enabled flag.
4

5
    Parameters
6
    ----------
7
    enabled : bool
8
        True to enable logging, False to disable it.
9
    """
10
    if enabled:
11
        logging.getLogger().setLevel(logging.INFO)
12
    else:
13
        logging.getLogger().setLevel(logging.CRITICAL)
14

15
set_logging_enabled(False)

Note To enable logs

Modify 'set_logging_enabled(False)' to 'set_logging_enabled(True)' in code below

Functions to display various alerts

In [5]:

1
def show_warn(warn_msg):
2
    """
3
    Display a warning message in a formatted HTML alert box.
4

5
    Parameters
6
    ----------
7
    warn_msg : str
8
        The warning message to display.
9
    """
10
    display(HTML(f'''<div class="alert alert-block alert-warning">
11
    <b class="fa fa-solid fa-exclamation-circle"></b>
12
    <div>
13
        <p><b>Action Required</b></p>
14
        <p>{warn_msg}</p>
15
    </div>
16
</div>'''))
17

18

19
def show_error(error_msg):
20
    """
21
    Display an error message in a formatted HTML alert box.
22

23
    Parameters
24
    ----------
25
    error_msg : str
26
        The error message to display.
27
    """
28
    display(HTML(f'''<div class="alert alert-block alert-danger">
29
    <b class="fa fa-solid fa-exclamation-triangle"></b>
30
    <div>
31
        <p><b>Error</b></p>
32
        <p>{error_msg}</p>
33
    </div>
34
</div>'''))
35

36

37
def show_success(success_msg):
38
    """
39
    Display a success message in a formatted HTML alert box.
40

41
    Parameters
42
    ----------
43
    success_msg : str
44
        The success message to display.
45
    """
46
    display(HTML(f'''<div class="alert alert-block alert-success">
47
    <b class="fa fa-solid fa-check-circle"></b>
48
    <div>
49
        <p><b>Success</b></p>
50
        <p>{success_msg}</p>
51
    </div>
52
</div>'''))

Utility functions handling db connection and archiving

In [6]:

1
def execute_query(dbcon, query_txt):
2
    """
3
    Execute a SQL query on the specified database connection.
4

5
    Parameters
6
    ----------
7
    dbcon : connection
8
        The database connection object.
9
    query_txt : str
10
        The SQL query to execute.
11

12
    Returns
13
    -------
14
    list
15
        A list of rows returned by the query.
16
    """
17
    try:
18
        with dbcon.cursor() as cur:
19
            cur.execute(query_txt)
20
        return cur.fetchall()
21
    except Exception as e:
22
        logging.error(f"Failed to execute query: {e}")
23
        raise Exception('Failed to execute query')
24

25

26
def make_tarfile(output_filename, source_dir):
27
    """
28
    Create a tar.gz archive of a directory.
29

30
    Parameters
31
    ----------
32
    output_filename : str
33
        The name of the output archive file.
34
    source_dir : str
35
        The path to the directory to archive.
36

37
    Returns
38
    -------
39
    bool
40
        True if the archive was created successfully, False otherwise.
41
    """
42
    try:
43
        with tarfile.open(output_filename, "w:gz") as tar:
44
            tar.add(source_dir, arcname=os.path.basename(source_dir))
45
        time.sleep(2)
46
        file_stats = os.stat(output_filename)
47
        logging.info(f'{output_filename} has size {(file_stats.st_size / (1024 * 1024))} mb')
48
        return True
49
    except Exception as e:
50
        logging.error(f'Failed to create archive: {e}')
51
        raise Exception(f'Failed to create archive: {e}')

Utility functions handling HTML generation

In [7]:

1
def copy_index_file(p_local_dir_path, p_templ_base_url):
2
    """
3
    Copy the index file to the local directory.
4

5
    Parameters
6
    ----------
7
    p_local_dir_path : str
8
        The local directory path.
9
    p_templ_base_url : str
10
        The base URL for the template.
11

12
    """
13
    index_file = f'{p_local_dir_path}/index.html'
14
    # Fetch the content of the index template from a URL
15
    index_file_content = fetch_url_content(p_templ_base_url + 'index.template.html')
16
    # Write the index file content to a local file
17
    with open(index_file, 'w') as writer:
18
        writer.write(str(index_file_content))
19
    logging.info('Index Page are generated')
20

21

22
def generate_html_list(links):
23
    """
24
    Generate an HTML ordered list from a comma-separated list of links.
25

26
    Parameters
27
    ----------
28
    links : str
29
        A comma-separated list of links.
30

31
    Returns
32
    -------
33
    str
34
        The HTML formatted ordered list.
35
    """
36
    if 'nan' == links:
37
        return ''
38

39
    html_list = '<ol>'
40
    for item in links.split(','):
41
        html_list += f'<li><a href="{item}">{item}</a></li>'
42
    html_list += '</ol>'
43
    return html_list
44

45
def generate_stage_link(title, stg_path, curr_file_path):
46
    """
47
    Generate an HTML link to a stage area.
48

49
    Parameters
50
    ----------
51
    stg_path : str
52
        The path to the stage area.
53
    curr_file_path : str
54
        The current file path.
55

56
    Returns
57
    -------
58
    str
59
        The HTML formatted link.
60
    """
61
    url = f"https://portal.singlestore.com/organizations/{os.environ['SINGLESTOREDB_ORGANIZATION']}/workspaces/{os.environ['SINGLESTOREDB_WORKSPACE_GROUP']}#stage/{stg_path}"
62
    return f"""<div style=\"text-align:center;margin-top:5px; margin-bottom:5px;\">
63
                {title}
64
                 STAGE Link &nbsp;&nbsp;&nbsp;&nbsp; <a href='{url}'> {curr_file_path} </a>
65
               </div>"""

Function loading query data in CSV

In [8]:

1
def fetch_url_content(url):
2
    """
3
    Fetch the content of a URL.
4

5
    Parameters
6
    ----------
7
    url : str
8
        The URL to fetch.
9

10
    Returns
11
    -------
12
    str
13
        The content of the URL.
14
    """
15
    try:
16
        with urlopen(url) as response:
17
            if response.status == 200:
18
                my_bytes = response.read()
19
                file_content = my_bytes.decode("utf8")
20
                return file_content
21
    except HTTPError as e:
22
        logging.error(f'Failed to read {url} - HTTP error code: {e.code} reason: {e.reason}')
23
        raise Exception(f'Failed to read {url} - HTTP error code: {e.code} reason: {e.reason}')
24

25

26
def load_query_data(url):
27
    """
28
    Load CSV data from a URL into a pandas DataFrame.
29

30
    Parameters
31
    ----------
32
    url : str
33
        The URL of the CSV file.
34

35
    Returns
36
    -------
37
    pandas.DataFrame
38
        The loaded DataFrame.
39
    """
40
    csv_file_content = fetch_url_content(url)
41
    csv_df = pd.read_csv(io.StringIO(csv_file_content), sep=",",
42
                         dtype={'QueryID': int, 'QueryName': str, 'QueryTxt': str, 'QueryParams': str})
43
    csv_df.sort_values(by=['QueryID'], inplace=True)
44
    return csv_df

Verify Stage Path and Create if not exists

In [9]:

1
def verify_stage_area():
2
    """
3
    Verify the existence and writability of a stage area.
4

5
    Returns
6
    -------
7
    bool
8
        True if the stage area is valid, False otherwise.
9
    """
10
    try:
11
        global s2_workgroup_stage, s2_workspace_name
12
        my_workspace_mngr = s2.manage_workspaces()
13
        workspace_group = my_workspace_mngr.get_workspace_group(WORKGROUP_ID)
14
        s2_workspace_name = my_workspace_mngr.get_workspace(os.environ['SINGLESTOREDB_WORKSPACE']).name
15
        stage_obj = workspace_group.stage.mkdir(stage_path=stage_folder_path, overwrite=False)
16
        logging.info(
17
            f'Stage Path {stage_folder_path} is ok. Is Directory: {stage_obj.is_dir()}. Is Writeable: {stage_obj.writable}')
18
        if stage_obj.is_dir() and stage_obj.writable:
19
            s2_workgroup_stage = workspace_group.stage
20
            logging.info(f'stage is valid: {s2_workgroup_stage is not None}')
21
            return True
22
        else:
23
            logging.error(f'As provided path is neither directory nor writable.')
24
            return False
25
    except Exception as stage_ex:
26
        logging.error(f'Stage Path Verification Failed. {stage_ex}')
27
        return False

Functions to analyze data type mismatch

In [10]:

1
def parse_table_row_counts(plan_id,json_data):
2
    """
3
    Extract table names and their corresponding rowcounts from a JSON string.
4

5
    Args:
6
    json_data (str): The JSON string containing table names and rowcounts.
7

8
    Returns:
9
    dict: A dictionary containing table names as keys and their rowcounts as values.
10
    """
11
    try:
12
        if not json_data:
13
            logging.warning(f'Plan id: {plan_id}, Optimizer notes does not contain row count element')
14
            return None
15

16
         # Extract rowcounts for each table
17
        matches = row_count_parser.find(json.loads(json_data))
18

19
        # Create a dictionary to store table names and rowcounts
20
        table_rowcounts = {}
21
        for match in matches:
22
            # Extract the table name from the JSONPath match
23
            table_name = match.full_path.left.fields[0]
24
            rowcount = match.value
25
            table_rowcounts[table_name] = rowcount
26

27
        return table_rowcounts
28
    except json.JSONDecodeError as e:
29
        # Raise an error if JSON parsing fails
30
        logging.error("Invalid JSON data: " + str(e))
31
        raise ValueError("Invalid JSON data: " + str(e))
32

33

34
def fetch_table_row_counts(database_name, table_name):
35
    """
36
    Fetch the row count for a given table from the database, using a cursor.
37

38
    Args:
39
    cursor: The database cursor object.
40
    database_name (str): The name of the database.
41
    table_name (str): The name of the table.
42

43
    Returns:
44
    int: The row count of the table, or -1 if the row count cannot be determined.
45
    """
46
    global table_row_count_cache
47

48
    if database_name:
49
        lookup_key = f'{database_name}.{table_name}'
50
        # Check if the row count is already cached
51
        if lookup_key in table_row_count_cache.keys():
52
            return table_row_count_cache[lookup_key]
53
        else:
54
            logging.warning(f'{lookup_key} is missing in cache, will fetch and update cache')
55
            result = None
56
            with s2.connect(results_type='dict').cursor() as cursor:
57
                # Fetch the row count from the database
58
                cursor.execute(f"select sum(rows) as total_rows from information_schema.table_statistics "
59
                            f"where database_name = '{database_name}' and table_name = '{table_name}'")
60
                result = cursor.fetchone()
61
            # Check if the result is None
62
            if result is None:
63
                logging.warning(f'RowCounts missing for database:{database_name}, table: {table_name}')
64
                return -1
65
            else:
66
                # Cache the row count and return it
67
                table_row_count_cache[lookup_key] = result['total_rows']
68
                logging.info(f"fetched rowcount: {result['total_rows']}")
69
                return result['total_rows']
70
    else:
71
        logging.warning(f"database field empty, so returning -1 for table:{table_name}")
72
        return -1
73

74

75

76

77
def fetch_rowcount(json_data):
78
    """
79
    Extract table names and their corresponding rowcounts from a JSON string.
80

81
    Args:
82
    json_data (str): The JSON string containing table names and rowcounts.
83

84
    Returns:
85
    dict: A dictionary containing table names as keys and their rowcounts as values.
86
    """
87
    try:
88
        # Parse the JSON string
89
        data = json.loads(json_data)
90

91
        # Define the JSONPath expression to extract rowcounts
92
        expr = parse("$.*.rowcount")
93

94
        # Extract rowcounts for each table
95
        matches = expr.find(data)
96

97
        # Create a dictionary to store table names and rowcounts
98
        table_rowcounts = {}
99
        for match in matches:
100
            # Extract the table name from the JSONPath match
101
            table_name = match.full_path.left.fields[0]
102
            rowcount = match.value
103
            table_rowcounts[table_name] = rowcount
104

105
        return table_rowcounts
106
    except json.JSONDecodeError as e:
107
        # Raise an error if JSON parsing fails
108
        logging.error("Invalid JSON data: " + str(e))
109
        raise ValueError("Invalid JSON data: " + str(e))
110

111

112
def extract_table_columns(comparison_expression, table_aliases):
113
    """
114
    Extract left and right table columns from a comparison expression.
115

116
    Args:
117
    comparison_expression (str): The comparison expression.
118

119
    Returns:
120
    tuple: A tuple containing the left and right table columns if they exist,
121
           or None if only one side is present in the expression.
122
    """
123
    try:
124
        match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ `([^`]+)`\.`([^`]+)`\)", comparison_expression)
125
        if match:
126
            left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)
127
            left_col = (match.group(2) if match.group(2) else "")
128
            right_tab = resolve_table_names((match.group(3) if match.group(3) else ""), table_aliases)
129
            right_col = (match.group(4) if match.group(2) else "")
130
            return left_tab, left_col, right_tab, right_col
131
        # exp like "(`mq`.`keyword` > 0)"
132
        match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ ([\d.]+)\)", comparison_expression)
133
        if match:
134
            left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)
135
            left_col = (match.group(2) if match.group(2) else "")
136
            right_tab = ""
137
            right_col = ""
138
            return left_tab, left_col, right_tab, right_col
139
        # exp like "(`mq`.`keyword` > '0')"
140
        match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ ('.+')\)", comparison_expression)
141
        if match:
142
            left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)
143
            left_col = (match.group(2) if match.group(2) else "")
144
            right_tab = ""
145
            right_col = ""
146
            return left_tab, left_col, right_tab, right_col
147
        # exp like "( 0 < `mq`.`keyword`)"
148
        match = re.match(r"\(([\d.]+) (?:=|>|<|>=|<=)+ `([^`]+)`\.`([^`]+)`\)", comparison_expression)
149
        if match:
150
            left_tab = ""
151
            left_col = ""
152
            right_tab = resolve_table_names((match.group(2) if match.group(2) else ""), table_aliases)
153
            right_col = (match.group(3) if match.group(3) else "")
154
            return left_tab, left_col, right_tab, right_col
155
        # exp like "(`mq`.`keyword` = NULL)"
156
        match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ (.*?)\)", comparison_expression)
157
        if match:
158
            left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)
159
            left_col = (match.group(2) if match.group(2) else "")
160
            right_tab = ""
161
            right_col = ""
162
            return left_tab, left_col, right_tab, right_col
163
        # exp like "(`mq`.`keyword` = 'NULL')"
164
        match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ ('.*?')\)", comparison_expression)
165
        if match:
166
            left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)
167
            left_col = (match.group(2) if match.group(2) else "")
168
            right_tab = ""
169
            right_col = ""
170
            return left_tab, left_col, right_tab, right_col
171

172
        # exp like ( DATE_FORMAT(`mq`.`record_date`,'%Y') = `mt`.`year`)
173
        match = re.match(r"\( ([A-Za-z_]+)\(`([^`]+)`\.`([^`]+)`\,('.+')\) (?:=|>|<|>=|<=)+ `([^`]+)`\.`([^`]+)`\)",
174
                         comparison_expression)
175
        if match:
176
            left_tab = resolve_table_names((match.group(2) if match.group(2) else ""), table_aliases)
177
            left_col = (match.group(3) if match.group(3) else "")
178
            right_tab = resolve_table_names((match.group(5) if match.group(5) else ""), table_aliases)
179
            right_col = (match.group(6) if match.group(6) else "")
180
            return left_tab, left_col, right_tab, right_col
181

182
        return None, None, None, None
183
    except Exception as ce:
184
        logging.error(f"Error extracting table columns from '{comparison_expression}': {ce}")
185
        raise
186

187

188
def resolve_table_names(table_alias, table_alias_dict):
189
    """
190
    Resolve the actual table name from a given table alias.
191

192
    Args:
193
    table_alias (str): The table alias to resolve.
194
    table_alias_dict (dict): A dictionary mapping table aliases to actual table names.
195

196
    Returns:
197
    str: The resolved table name.
198
    """
199
    try:
200
        if table_alias in table_alias_dict:
201
            return table_alias_dict[table_alias]
202
        elif re.match(r'.*_[0-9]+$', table_alias):
203
            return table_alias[:-2]
204
        else:
205
            return table_alias
206
    except Exception as e:
207
        logging.error(f"Error resolving table name for alias '{table_alias}': {e}")
208
        raise
209

210

211
def analyze_comparison(row):
212
    """
213
    Analyze a comparison expression in a row and generate a suggestion based on table row counts.
214

215
    Args:
216
    row (dict): A dictionary representing a row of data with keys 'DATABASE_NAME', 'CMP_EXP', 'LEFT_TABLE',
217
                'RIGHT_TABLE', and 'TBL_ROW_COUNTS'.
218

219
    Returns:
220
    dict: The input row dictionary updated with 'SUGGESTION', 'LEFT_TABLE_ROW_COUNT', and 'RIGHT_TABLE_ROW_COUNT'.
221
    """
222
    try:
223
        db_name = row['DATABASE_NAME']
224
        cmp_element = row['CMP_EXP']
225
        left_table = row['LEFT_TABLE']
226
        right_table = row['RIGHT_TABLE']
227
        #rowcount_dict = fetch_rowcount(row['TBL_ROW_COUNTS'])
228
        rowcount_dict = parse_table_row_counts(row['PLAN_ID'],row['TBL_ROW_COUNTS'])
229

230
        if rowcount_dict is None:
231
            rowcount_dict = {}
232
            if left_table:
233
                rowcount_dict[f'{db_name}.{left_table}'] = fetch_table_row_counts(db_name, left_table)
234
            if right_table:
235
                rowcount_dict[f'{db_name}.{right_table}'] = fetch_table_row_counts(db_name, right_table)
236

237
        suggestion = io.StringIO()
238

239
        suggestion.write(f'For Expression: {cmp_element}. ')
240
        left_row_count = None
241
        right_row_count = None
242

243
        if left_table is not None:
244
            left_lookup_key = (left_table if db_name in left_table else db_name + '.' + left_table)
245
            left_row_count = rowcount_dict.get(left_lookup_key, -1)
246

247
        if right_table is not None:
248
            right_lookup_key = (right_table if db_name in right_table else db_name + '.' + right_table)
249
            right_row_count = rowcount_dict.get(right_lookup_key, -1)
250

251
        if left_row_count is not None and right_row_count is not None:
252
            if left_row_count < right_row_count:
253
                suggestion.write(f"{left_table} has fewer records, consider table size while optimizing.")
254
            elif left_row_count > right_row_count:
255
                suggestion.write(f"{right_table} has fewer records, consider table size while optimizing.")
256
            else:
257
                suggestion.write(f"The number of records is equal on both sides of the expression.")
258
        else:
259
            suggestion.write(f"Unable to determine row counts for comparison: {cmp_element}.")
260

261
        row['SUGGESTION'] = suggestion.getvalue()
262
        row['LEFT_TABLE_ROW_COUNT'] = left_row_count
263
        row['RIGHT_TABLE_ROW_COUNT'] = right_row_count
264
        return row
265
    except Exception as e:
266
        logging.error(f"Error analyzing comparison for row '{row}': {e}")
267
        raise
268

269

270
def generate_datatype_mismatch_rep(conn, output_file, batch_size):
271
    """
272
    Process database records fetched using a SQL query and generate a excel report.
273

274
    Args:
275
    conn: The database connection object.
276
    output_file (str): filepath to write.
277
    batch_size (int): database query fetch size.
278

279
    Returns:
280
    None
281
    """
282
    try:
283
        logging.info("Starting database record processing...")
284
        lines = []
285
        total_records_processed = 0
286
        data_prep_start = time.perf_counter()
287
        with conn.cursor() as cursor:
288
                cursor.execute(MISMATCH_DATATYPE_ROWCOUNT_COMP_QUERY)
289
                while True:
290
                    database_records = cursor.fetchmany(batch_size)
291
                    logging.info(f"fetched {len(database_records)} rows")
292
                    if not database_records:
293
                        break
294
                    for row in database_records:
295
                        try:
296
                            parser = Parser(row['SQL_QUERY'])
297
                            row['LEFT_TABLE'], row['LEFT_COLUMN'], row['RIGHT_TABLE'], row[
298
                                'RIGHT_COLUMN'] = extract_table_columns(row['CMP_EXP'], parser.tables_aliases)
299
                            row = analyze_comparison(row)
300
                            lines.append(row)
301

302
                            total_records_processed += 1
303
                            if total_records_processed % 1000 == 0:
304
                                logging.info(f"Processed {total_records_processed} records...")
305
                        except Exception as ex:
306
                            logging.debug(f"while processing record: {row}")
307
                            logging.error(ex)
308

309
        logging.debug(f"total Processed {total_records_processed} records")
310
        logging.info(f'Data Preparation took {(time.perf_counter() - data_prep_start):.4f} seconds')
311
        if total_records_processed > 0 :
312
            report_write_start = time.perf_counter()
313

314
            wb = Workbook()
315
            ws = wb.active
316
            ws.append(list(HEADERS))
317

318
            for line in lines:
319
                ws.append(list(line.values()))
320

321
            wb.save(output_file)
322

323
            logging.info("Writing to report completed")
324
            logging.info(f'Report Writing took {(time.perf_counter() - report_write_start):.4f} seconds')
325
            return True
326
        else:
327
            logging.info('No records to write, skipping report')
328
            return False
329

330
        #logging.info("Database record processing completed.")
331

332
    except Exception as e:
333
        logging.error(f"An error occurred during database record processing: {e}")
334
        raise

Process dataframe and generate reports for each query in csv

In [11]:

1
def process_dataframe(query_csv_dataframe):
2
    """
3
    Process a DataFrame containing query data.
4

5
    Parameters
6
    ----------
7
    query_csv_dataframe : pandas.DataFrame
8
        The DataFrame containing query data.
9

10
    """
11
    excel_report_file = f'{local_dir_path}/perf_troubleshoot_report.xlsx'
12
    with pd.ExcelWriter(excel_report_file, engine="openpyxl") as xlwriter:
13

14
        for idx, row in query_csv_dataframe.astype(str).iterrows():
15
            query_id = row['QueryID']
16
            query_name = row['QueryName']
17
            query = row['QueryTxt']
18

19
            logging.debug(f'about to execute {query_name}')
20
            xlwriter.book.create_sheet(query_name[:30])
21

22
            try:
23
                # Execute the query
24
                result = execute_query(conn, query)
25

26
                logging.info(f"Fetched query ID: {query_id} NAME: {query_name}")
27
                # Fetch the template for the result page
28
                template = fetch_url_content(template_url_base + 'Result-' + str(query_id) + '.template.html')
29
                if not result:
30
                    # If the result is empty, replace the template placeholder with an empty table
31
                    final_content = template.replace('rstable', empty_result_table)
32
                else:
33
                    # If the result is not empty, format it as an HTML table and replace the template placeholder
34
                    result_df = pd.DataFrame(result)
35
                    result_df.columns = map(str.upper, result_df.columns)
36
                    result_table_id = 'rstbl'
37
                    result_table_content = result_df.to_html(table_id=result_table_id,
38
                                                             index=False,
39
                                                             classes=result_table_html_classes)
40

41
                    final_content = template.replace('rstable', result_table_content)
42
                    result_df.to_excel(xlwriter, sheet_name=query_name, index=False)
43

44
                # Write the final content to an HTML file
45
                report_file = f'{local_dir_path}/{query_id}.html'
46
                with open(report_file, 'w') as writer:
47
                    writer.write(final_content)
48

49

50
            except Exception as curr_iter_err:
51
                # If an exception occurs during query execution, log the error and show a warning message
52
                logging.error(f"Error executing query ID: {query_id}, NAME: {query_name}: {curr_iter_err}")
53
                logging.exception("Exception details")
54
                show_warn(f"Error executing query ID: {query_id}, NAME: {query_name}")
55

56
            logging.info(f'process completed for ID:{query_id} Name:{query_name}')
57

58

59
    logging.info('Result Pages are generated')
60
    logging.info(f'Excel Report perf_troubleshoot_report.xlsx is generated')

Function to clean up generated directories

In [12]:

1
def clean_dir(p_dir_path, p_archive_file_path):
2
    """
3
    Clean the local directory by removing all HTML files and the archive file.
4

5
    Parameters
6
    ----------
7
    p_dir_path : str
8
        The path to the local directory.
9
    p_archive_file_path : str
10
        The path to the archive file.
11

12
    """
13
    # Remove the archive file
14
    try:
15
        os.remove(p_archive_file_path)
16
        logging.info('Local archive file removed')
17
        logging.info('about to clean previous generated files in local dir')
18
        shutil.rmtree(p_dir_path)
19
    except OSError as e:
20
        logging.error('Clean up failed')
21
        execution_success = False
22
        error_msg = 'clean up failed'
23
        print("Error: %s : %s" % (dir_path, e.strerror))
24
        raise Exception(f'Failed to clean up {str(e)}')

In [13]:

1
if connection_url.endswith('/'):
2
    show_warn('Database not selected. Please select from dropdown in top of web page')
3
else:
4
    execution_success = True
5
    final_file_path = None
6
    error_msg = None
7
    try:
8
        if verify_stage_area():
9
            # Establish a database connection, use dict as results_type
10
            conn = s2.connect(results_type='dict')
11
            logging.info('Database Connection establised')
12
            # Load query data from a csv file into a pandas DataFrame.
13
            queries_df = load_query_data(url=query_data_url)
14
            logging.info('Query Data loaded')
15
             # Create a local directory for storing result files
16
            local_dir_path = (s2_workspace_name + local_output_dir_suffix)
17
            path = Path(local_dir_path)
18
            path.mkdir(exist_ok=True)
19

20
            process_dataframe(queries_df)
21

22

23
            mismatch_report_file_path = f"{local_dir_path}/datatype-mismatch-comparision-report.xlsx"
24
            generate_datatype_mismatch_rep(conn, output_file=mismatch_report_file_path, batch_size=1000)
25

26
            copy_index_file(local_dir_path, template_url_base)
27

28
            # Create a zip archive of the result files
29
            final_file_path = s2_workspace_name + '_PERF_REPORT_' + my_timestamp + '.tar.gz'
30

31
            zip_success = make_tarfile(final_file_path, local_dir_path)
32

33
            logging.info('archive created')
34
            # Upload the zip archive to the stage area
35
            if zip_success:
36
                try:
37
                    uploaded_obj = s2_workgroup_stage.upload_file(local_path=final_file_path,
38
                                                                  stage_path=f'{stage_folder_path}/{final_file_path}')
39
                    logging.info(f'Upload success. Path: {uploaded_obj.abspath()} ')
40
                    print(f'File uploaded to STAGE AREA: {uploaded_obj.abspath()}')
41
                    logging.info('Upload success')
42
                except Exception as e:
43
                    # If an exception occurs during the upload process, log the error
44
                    execution_success = False
45
                    logging.error(f'Failed during upload process{e}')
46
                    error_msg = 'File Upload failed'
47

48
                clean_dir(local_dir_path, final_file_path)
49
                logging.info('Local files cleaned')
50

51
            else:
52
                # If creating the zip archive fails, set execution_success to False and log the error
53
                logging.error('Failed to create archive')
54
                execution_success = False
55
                error_msg = 'Failed to create archive'
56

57
        else:
58
            # If verifying the stage area fails, set execution_success to False and log the error
59
            logging.info("Stage Area Verification Failed. Exiting.")
60
            print('Script execution Failed')
61
            execution_success = False
62
            error_msg = 'Failed to create missing stage area path or it is not writeable'
63
    except Exception as e:
64
        execution_success = False
65
        logging.error(f"An error occurred: {e}")
66
        logging.exception("Exception details")
67
        error_msg = f'Exception occured. {str(e)}'
68

69
    # Display a success or error message based on the execution success
70
    if execution_success:
71
        #show_success("Files are uploaded to Stage")
72
        show_success(generate_stage_link('Upload to stage success, File: ', stage_folder_path, final_file_path))
73

74
    else:
75
        show_error(error_msg)
76

77
    logging.info(f'Script execution completed sucessfully: {execution_success}')

Important NOTE

Actions suggested suit most of performance improvement scenarios, Still we would encourage to test and verify before applying on prod environemnts
To use notebook as scheduled one, we have to modify python code to refer configuration from table instead of user input

Details

About this Template

This notebook will help you perform database performance troubleshoot

This Notebook can be run in Standard and Enterprise deployments.

License

This Notebook has been released under the Apache 2.0 open source license.

See Notebook in action

Launch this notebook in SingleStore and start executing queries instantly.

Database Performance Troubleshoot Notebook

Notebook

Database Performance Troubleshoot Notebook

Intro

What you will learn in this notebook:

What benefits do you get out of using the notebook.

Questions?

Pre-requisites

Install Libraries

Imports

Variables

Log Control

Functions to display various alerts

Utility functions handling db connection and archiving

Utility functions handling HTML generation

Function loading query data in CSV

Verify Stage Path and Create if not exists

Functions to analyze data type mismatch

Process dataframe and generate reports for each query in csv

Function to clean up generated directories

Details

About this Template

Tags

License

See Notebook in action