Speeding up Dash app

Jason
Senior Member

Posts: 482
Raspberry Pi: Yes

Speeding up Dash app Oct 19, 2021 8:17:49 GMT -8

Quote

Post by Jason on Oct 19, 2021 8:17:49 GMT -8

Been messing around with possible optimizations of the Dash app that aren't too invasive. In order to identify bottlenecks, I added some code to measure page loading times. The status page titled SkyWeather Status is the first page loaded in a new browser session and is also one of the slowest pages to load. The average loading time for the page was ~3.5 minutes. My use case is slightly different than most as I have a distributed architecture where various infrastructure components are running on different RPis separate from my SkyWeather2 RPi.

I've got MariaDB running on a RPi 1 on my desk. While not the most powerful board obviously, it is still more than sufficient to run MariaDB and serve data to the Dash app assuming the underlying data model is optimized for serving the data. As pointed out by another board member, and through my own testing, the data model is in need of some optimization.

The following database indexes were added to my database:

CREATE OR REPLACE INDEX idx_wr2_ts ON SkyWeather2.WeatherData (TimeStamp);
CREATE OR REPLACE INDEX idx_wsaqi_ts ON WeatherSenseWireless.AQI433MHZ (TimeStamp);
CREATE OR REPLACE INDEX idx_wstb_ts ON WeatherSenseWireless.TB433MHZ (TimeStamp);
CREATE OR REPLACE INDEX idx_wsaf_ts ON WeatherSenseWireless.AS433MHZ (TimeStamp);
CREATE OR REPLACE INDEX idx_sm_ts ON WeatherSenseWireless.SolarMax433MHZ (TimeStamp);
CREATE OR REPLACE INDEX idx_iths_ts ON SkyWeather2.IndoorTHSensors (TimeStamp);

As mentioned previously, it was taking ~3.5 minutes to load the status page prior to adding the indexes. After adding the indexes, the page loads in ~1s. Admittedly, some of the time savings is likely due to MariaDB caching the SQL and avoiding additional parsing of SQL it has already seen and successfully parsed. However, savings from the parsing are likely measure in milliseconds and not minutes. I'm going to continue testing and will add to this thread if additional indexes are needed.

Thanks,

Jason

SDL Global Moderator Posts: 5,655 Raspberry Pi: Yes Other Device: PC, arduino, beagleBone	Speeding up Dash app Oct 20, 2021 14:12:15 GMT -8 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by SDL on Oct 20, 2021 14:12:15 GMT -8 Jason, Make perfect sense. The timestamp was not indexed. I will add this to our databases. GREAT FIND! I had just never taken the time to look at optimizations. BP

Jason
Senior Member

Posts: 482
Raspberry Pi: Yes

Speeding up Dash app Oct 21, 2021 2:08:14 GMT -8

Quote

Post by Jason on Oct 21, 2021 2:08:14 GMT -8

Yeah my mind immediately filled with overly complicated architectures I’ve used in the past such as caching with Redis, etc. Fortunately, I realized anything too invasive was work I just didn’t feel like doing lol (sometimes laziness is a gift 😉) and instead focused on query optimization. I’m still a bit puzzled by the number of times some queries execute, but, for now, they execute pretty quickly.

Jason

Last Edit: Oct 21, 2021 2:12:18 GMT -8 by Jason

Jason
Senior Member

Posts: 482
Raspberry Pi: Yes

Speeding up Dash app Oct 21, 2021 4:57:40 GMT -8

Quote

Post by Jason on Oct 21, 2021 4:57:40 GMT -8

Looked at the rest of the objects in the two schemas and here is a more complete list of indexes based on timestamps.

CREATE OR REPLACE INDEX idx_iths_ts ON SkyWeather2.IndoorTHSensors (TimeStamp);
CREATE OR REPLACE INDEX idx_ps_ts ON SkyWeather2.PowerSystem (TimeStamp);
CREATE OR REPLACE INDEX idx_sl_ts ON SkyWeather2.SystemLog (TimeStamp);
CREATE OR REPLACE INDEX idx_wsaqi_ts ON WeatherSenseWireless.AQI433MHZ (TimeStamp);
CREATE OR REPLACE INDEX idx_wsaf_ts ON WeatherSenseWireless.AS433MHZ (TimeStamp);
CREATE OR REPLACE INDEX idx_wsg_ts ON WeatherSenseWireless.Generic (TimeStamp);
CREATE OR REPLACE INDEX idx_iths_ts ON WeatherSenseWireless.IndoorTHSensors (TimeStamp);
CREATE OR REPLACE INDEX idx_skp_ts ON WeatherSenseWireless.SkyCamPictures (timestamp);
CREATE OR REPLACE INDEX idx_sks_ts ON WeatherSenseWireless.SkyCamSensors (timestamp);
CREATE OR REPLACE INDEX idx_sm_ts ON WeatherSenseWireless.SolarMax433MHZ (TimeStamp);
CREATE OR REPLACE INDEX idx_wstb_ts ON WeatherSenseWireless.TB433MHZ (TimeStamp);
CREATE OR REPLACE INDEX idx_wr2_ts ON WeatherSenseWireless.WeatherData (TimeStamp);

Jason

SDL Global Moderator Posts: 5,655 Raspberry Pi: Yes Other Device: PC, arduino, beagleBone	Speeding up Dash app Oct 23, 2021 9:01:03 GMT -8 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by SDL on Oct 23, 2021 9:01:03 GMT -8 I'm adding this to our new version of SkyWeather2. I'm also fixing this in SmartGarden3. Different program, but same problem. BP

Jason
Senior Member

Posts: 482
Raspberry Pi: Yes

Speeding up Dash app Nov 12, 2021 5:16:15 GMT -8

Quote

Post by Jason on Nov 12, 2021 5:16:15 GMT -8

Another suggestion that might be of interest. Within indoorth.py, the SQL to retrieve temp/humidity data is called twice due to the structure of the script. Currently, the function generateTHData() is called inside the functions buildTGraph() and buildHGraph(). Those two functions are called from the main entry function IndoorTHPage(). I moved the call to generateTHData() into IndoorTHPage() and pass the query data to buildTGraph() and buildHGraph() as a parameter and avoid incurring the extra SQL query.

In addition, I've made a number of changes using Pandas and connection pooling. Interested in me sharing those?

Thanks,

Jason

Last Edit: Nov 12, 2021 5:16:59 GMT -8 by Jason

SDL Global Moderator Posts: 5,655 Raspberry Pi: Yes Other Device: PC, arduino, beagleBone	Speeding up Dash app Nov 12, 2021 7:50:57 GMT -8 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by SDL on Nov 12, 2021 7:50:57 GMT -8 Jason, definitely. I'd like to see that. We ar3e just working on the dash_app for SmartGarden3 and I can implement them there first. BP

Jason
Senior Member

Posts: 482
Raspberry Pi: Yes

Speeding up Dash app Nov 12, 2021 8:03:28 GMT -8

Quote

Post by Jason on Nov 12, 2021 8:03:28 GMT -8

I chose to centralize connection management in my version of the Dash app. I did so by changing the python library I'm using to communicate with the database and also by writing my own class for connection pooling.

The library I'm using to talk to MariaDB can be found here. Within it is offered a connection pool that I'm leveraging in my database class which currently is as follows:

import json
import mariadb
import pandas as pd


class DbConnPool:
    def __init__(self, user, password, host, port, pool_name, pool_size):
        self.user = user
        self.password = password
        self.host = host
        self.port = port
        self.pool_name = pool_name
        self.pool_size = pool_size

        self.pool = self.create_conn_pool()
    

    # Create connection pool
    def create_conn_pool(self):
        print('Opening connection pool...')
        pool = mariadb.ConnectionPool(
            user=self.user,
            password=self.password,
            host=self.host,
            port=self.port,
            pool_name=self.pool_name,
            pool_size=self.pool_size
        )

        return pool


    def get_conn(self):
        print('Getting connection from pool...')
        return self.pool.get_connection()


    def close_conn(self, conn):
        print('Returning connection to pool...')
        conn.close()


    def close_conn_pool(self):
        print('Closing connection pool...')
        self.pool.close()


    def execute_sql(self, sql):
        print('Executing SQL...')
        print(sql)

        conn = self.get_conn()

        cur = conn.cursor()
        cur.execute(sql)
        data = cur.fetchall()
        # close cursor
        cur.close()
        # return connection to pool
        self.close_conn(conn)

        return data
    
    def execute_sql_df(self, sql):
        print('Executing SQL to DataFrame...')
        print(sql) # debug

        conn = self.get_conn()
        print('Reading SQL...') # debug
        # try:
        df = pd.read_sql(sql, conn)
        # except Exception as e:
            # print(e)

        # return connection to pool
        self.close_conn(conn)

        print('Returning DataFrame...') # debug
        return df

I manage the connection pool within index.py as follows:

# read config
with open('config.json', 'r') as config_file:
    config = json.load(config_file)

# open connection pool
conn_pool = DbConnPool (
    config['environment']['development']['database']['user'],
    config['environment']['development']['database']['password'],
    config['environment']['development']['database']['host'],
    config['environment']['development']['database']['port'],
    config['environment']['development']['database']['pool_name'],
    config['environment']['development']['database']['pool_size']
)

if __name__ == '__main__':
    try:
        print("dash_app running on port 8050")
        app.run_server(debug=True, host='0.0.0.0')
        # app.run_server(host='0.0.0.0')
    except KeyboardInterrupt as ki:
        conn_pool.close_conn_pool()

Basically, the pool is opened and passed around to the various pages and update code. As SQL is passed to the class, connections are gotten and returned to the pool. Definitely a work in progress, however, functioning nicely so far as I transition the scripts.

Thanks,

Jason

Jason
Senior Member

Posts: 482
Raspberry Pi: Yes

Speeding up Dash app Nov 12, 2021 8:13:33 GMT -8

Quote

Post by Jason on Nov 12, 2021 8:13:33 GMT -8

I've added pandas to several of the scripts so far to streamline the code. For example, indoorth.py uses for loops in order to build individual arrays to supply to traces added to the temperature and humidity graphs. Using a data frame eliminates the need for the looping.

My generateTHData() function is as follows (I cheated and hard coded the timeDelta in my SQL query so I need to fix for sure):

def generateTHData(conn_pool, timeDelta):
    try:
        now = datetime.datetime.now()
        before = now - timeDelta
        before = before.strftime('%Y-%m-%d %H:%M:%S')
        
        query = 'SELECT id, ChannelID, Temperature, Humidity, TimeStamp FROM `SkyWeather2`.`IndoorTHSensors` WHERE TimeStamp > DATE_SUB(CURRENT_TIMESTAMP, INTERVAL 7 DAY)'
        
        records = conn_pool.execute_sql_df(query)

        return records
    except mdb.Error as e:
        traceback.print_exc()
        print("Error %d: %s" % (e.args[0], e.args[1]))

In my version, the top of the function IndoorTHPage() is as follows:

def IndoorTHPage(conn_pool):
    start = datetime.datetime.now()

    time_delta = datetime.timedelta(days=7)
    print('Getting temperature/humidity data...')
    th_data = generateTHData(conn_pool, time_delta)

    # get unique array of channels in data
    th_channels = th_data.ChannelID.unique()

    # sort channels numerically
    th_channels.sort()

    # check temperature units
    # database units are metric
    config_units = readJSON.getJSONValue("English_Metric")
    if not config_units:
        # configuration units are NOT metric
        # convert temperatures to imperial
        th_data.Temperature = CTUnits(th_data.Temperature)

    t_graph = buildTGraph(th_data, th_channels)
    h_graph = buildHGraph(th_data, th_channels)

My version of buildTemperatureGraph() called from buildTGraph():

def buildTemperatureGraph(data, channels):
    # Create figure with secondary y-axis
    fig = go.Figure()
    if (len(data) == 0):
        fig = go.Figure()
        fig.update_layout(
            height=800,
            title_text='No Indoor Temperature Data Available')
        return fig

    fig = make_subplots(specs=[[{"secondary_y": True}]])

    colors = ['red', 'blue', 'green', 'violet', 'brown', 'black', 'magenta', 'pink']

    # min and max temperatures for bounding y-axis
    temp_min = data.Temperature.min() * 0.9
    temp_max = data.Temperature.max() * 1.1

    # add traces to figure for each channel
    for channel in channels:
        # filter DataFrame to specific channel
        channel_df = data[data.ChannelID == channel]

        fig.add_trace(
            go.Scatter(
                x=channel_df.TimeStamp,
                y=channel_df.Temperature,
                name=CHANNEL_LABELS[channel - 1],
                line=dict(
                    color=colors[channel - 1],
                    width=1.5
                )
            )
        )

    # Add figure title
    fig.update_layout(
        title_text="Temperature", height=400
    )

    units = TUnits()
    # Set x-axis title
    fig.update_xaxes(title_text="Time")

    # Set y-axes titles
    fig.update_yaxes(
        title_text="<b>Temperature (" + units + ")</b>",
        range=(temp_min, temp_max),
        secondary_y=False,
        side='left'
    )

    return fig

My version of buildHumidityGraph() called from buildHGraph():

def buildHumidityGraph(data, channels):
    # Create figure with secondary y-axis
    fig = go.Figure()
    if (len(data) == 0):
        fig = go.Figure()
        fig.update_layout(
            height=800,
            title_text='No Indoor Humidity Data Available')
        return fig

    fig = make_subplots(specs=[[{"secondary_y": True}]])

    colors = ['red', 'blue', 'green', 'violet', 'brown', 'black', 'magenta', 'pink']

    # add traces to figure for each channel
    for channel in channels:
        channel_df = data[data.ChannelID == channel]

        fig.add_trace(
            go.Scatter(
                x=channel_df.TimeStamp,
                y=channel_df.Humidity,
                name=CHANNEL_LABELS[channel - 1],
                line=dict(
                    color=colors[channel - 1],
                    width=1.5
                )
            )
        )

    # Add figure title
    fig.update_layout(
        title_text="Humidity", height=400
    )

    # units = "C"
    # Set x-axis title
    fig.update_xaxes(title_text="Time")

    # Set y-axes titles

    fig.update_yaxes(title_text="<b>Humidity (%)</b>", range = (0,100), secondary_y=False, side='left')

    return fig

Thanks,

Jason

Jason
Senior Member

Posts: 482
Raspberry Pi: Yes

Speeding up Dash app Nov 12, 2021 8:21:31 GMT -8

Quote

Post by Jason on Nov 12, 2021 8:21:31 GMT -8

What follows is a work in progress for the weather page. Here is my Jupyter notebook for handling the various rain totals utilizing pandas.

#%%
sql='SELECT id, TotalRain, TimeStamp FROM SkyWeather2.WeatherData WHERE TimeStamp > DATE_SUB(CURRENT_TIMESTAMP, INTERVAL 32 DAY)'

#%%
df = pd.read_sql(sql, conn, index_col='id', parse_dates='TimeStamp')

#%% parse date
df['ts_year'] = df['TimeStamp'].dt.year
df['ts_month'] = df['TimeStamp'].dt.month
df['ts_day'] = df['TimeStamp'].dt.day

#%% Current day
today = pd.Timestamp.now(tz='US/Eastern')
today_df = df[(df.ts_day == today.day) & (df.ts_month == today.month)]
today_df

#%% Current month
month_df = df[df.ts_month == today.month]
month_df

#%% Last 30 days
td = pd.Timedelta(30, 'days')
start_date = today.date() - td
last_thirty_days_df = df[df.TimeStamp.dt.date >= start_date]
last_thirty_days_df

#%% Last 24 hours
td = pd.Timedelta(24, 'hours')
start_ts = today - td
last_day_df = df[df.TimeStamp >= pd.Timestamp(start_ts).tz_convert(None)]
last_day_df

#%% Last 7 days
td = pd.Timedelta(6, 'days')
start_date = today.date() - td
last_days_df = df[df.TimeStamp.dt.date >= start_date]
last_days_df

accum = last_days_df['TotalRain'].iloc[-1] - last_days_df['TotalRain'].iloc[0]

Basically, I only do a single query to get the rain data as a data frame and then filter that data frame in order to calculate the various accumulations.

Thanks,

Jason

SDL Global Moderator Posts: 5,655 Raspberry Pi: Yes Other Device: PC, arduino, beagleBone	Speeding up Dash app Nov 12, 2021 11:17:06 GMT -8 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by SDL on Nov 12, 2021 11:17:06 GMT -8 Cleaner code all around. Of course, date frames still have internal loops, but they are probably faster than the python loops. I use Panda all over my book. John

Post by Jason on Oct 19, 2021 8:17:49 GMT -8

Post by SDL on Oct 20, 2021 14:12:15 GMT -8

Post by Jason on Oct 21, 2021 2:08:14 GMT -8

Post by Jason on Oct 21, 2021 4:57:40 GMT -8

Post by SDL on Oct 23, 2021 9:01:03 GMT -8

Post by Jason on Nov 12, 2021 5:16:15 GMT -8

Post by SDL on Nov 12, 2021 7:50:57 GMT -8

Post by Jason on Nov 12, 2021 8:03:28 GMT -8

Post by Jason on Nov 12, 2021 8:13:33 GMT -8

Post by Jason on Nov 12, 2021 8:21:31 GMT -8

Post by SDL on Nov 12, 2021 11:17:06 GMT -8

Quick Reply