|
Post by Jason on Oct 19, 2021 8:17:49 GMT -8
Been messing around with possible optimizations of the Dash app that aren't too invasive. In order to identify bottlenecks, I added some code to measure page loading times. The status page titled SkyWeather Status is the first page loaded in a new browser session and is also one of the slowest pages to load. The average loading time for the page was ~3.5 minutes. My use case is slightly different than most as I have a distributed architecture where various infrastructure components are running on different RPis separate from my SkyWeather2 RPi.
I've got MariaDB running on a RPi 1 on my desk. While not the most powerful board obviously, it is still more than sufficient to run MariaDB and serve data to the Dash app assuming the underlying data model is optimized for serving the data. As pointed out by another board member, and through my own testing, the data model is in need of some optimization.
The following database indexes were added to my database:
CREATE OR REPLACE INDEX idx_wr2_ts ON SkyWeather2.WeatherData (TimeStamp); CREATE OR REPLACE INDEX idx_wsaqi_ts ON WeatherSenseWireless.AQI433MHZ (TimeStamp); CREATE OR REPLACE INDEX idx_wstb_ts ON WeatherSenseWireless.TB433MHZ (TimeStamp); CREATE OR REPLACE INDEX idx_wsaf_ts ON WeatherSenseWireless.AS433MHZ (TimeStamp); CREATE OR REPLACE INDEX idx_sm_ts ON WeatherSenseWireless.SolarMax433MHZ (TimeStamp); CREATE OR REPLACE INDEX idx_iths_ts ON SkyWeather2.IndoorTHSensors (TimeStamp);
As mentioned previously, it was taking ~3.5 minutes to load the status page prior to adding the indexes. After adding the indexes, the page loads in ~1s. Admittedly, some of the time savings is likely due to MariaDB caching the SQL and avoiding additional parsing of SQL it has already seen and successfully parsed. However, savings from the parsing are likely measure in milliseconds and not minutes. I'm going to continue testing and will add to this thread if additional indexes are needed.
Thanks,
Jason
|
|
|
Post by SDL on Oct 20, 2021 14:12:15 GMT -8
Jason,
Make perfect sense. The timestamp was not indexed. I will add this to our databases. GREAT FIND! I had just never taken the time to look at optimizations.
BP
|
|
|
Post by Jason on Oct 21, 2021 2:08:14 GMT -8
Yeah my mind immediately filled with overly complicated architectures I’ve used in the past such as caching with Redis, etc. Fortunately, I realized anything too invasive was work I just didn’t feel like doing lol (sometimes laziness is a gift 😉) and instead focused on query optimization. I’m still a bit puzzled by the number of times some queries execute, but, for now, they execute pretty quickly.
Jason
|
|
|
Post by Jason on Oct 21, 2021 4:57:40 GMT -8
Looked at the rest of the objects in the two schemas and here is a more complete list of indexes based on timestamps.
CREATE OR REPLACE INDEX idx_iths_ts ON SkyWeather2.IndoorTHSensors (TimeStamp); CREATE OR REPLACE INDEX idx_ps_ts ON SkyWeather2.PowerSystem (TimeStamp); CREATE OR REPLACE INDEX idx_sl_ts ON SkyWeather2.SystemLog (TimeStamp); CREATE OR REPLACE INDEX idx_wsaqi_ts ON WeatherSenseWireless.AQI433MHZ (TimeStamp); CREATE OR REPLACE INDEX idx_wsaf_ts ON WeatherSenseWireless.AS433MHZ (TimeStamp); CREATE OR REPLACE INDEX idx_wsg_ts ON WeatherSenseWireless.Generic (TimeStamp); CREATE OR REPLACE INDEX idx_iths_ts ON WeatherSenseWireless.IndoorTHSensors (TimeStamp); CREATE OR REPLACE INDEX idx_skp_ts ON WeatherSenseWireless.SkyCamPictures (timestamp); CREATE OR REPLACE INDEX idx_sks_ts ON WeatherSenseWireless.SkyCamSensors (timestamp); CREATE OR REPLACE INDEX idx_sm_ts ON WeatherSenseWireless.SolarMax433MHZ (TimeStamp); CREATE OR REPLACE INDEX idx_wstb_ts ON WeatherSenseWireless.TB433MHZ (TimeStamp); CREATE OR REPLACE INDEX idx_wr2_ts ON WeatherSenseWireless.WeatherData (TimeStamp); Jason
|
|
|
Post by SDL on Oct 23, 2021 9:01:03 GMT -8
I'm adding this to our new version of SkyWeather2.
I'm also fixing this in SmartGarden3.
Different program, but same problem.
BP
|
|
|
Post by Jason on Nov 12, 2021 5:16:15 GMT -8
Another suggestion that might be of interest. Within indoorth.py, the SQL to retrieve temp/humidity data is called twice due to the structure of the script. Currently, the function generateTHData() is called inside the functions buildTGraph() and buildHGraph(). Those two functions are called from the main entry function IndoorTHPage(). I moved the call to generateTHData() into IndoorTHPage() and pass the query data to buildTGraph() and buildHGraph() as a parameter and avoid incurring the extra SQL query.
In addition, I've made a number of changes using Pandas and connection pooling. Interested in me sharing those?
Thanks,
Jason
|
|
|
Post by SDL on Nov 12, 2021 7:50:57 GMT -8
Jason,
definitely. I'd like to see that. We ar3e just working on the dash_app for SmartGarden3 and I can implement them there first.
BP
|
|
|
Post by Jason on Nov 12, 2021 8:03:28 GMT -8
I chose to centralize connection management in my version of the Dash app. I did so by changing the python library I'm using to communicate with the database and also by writing my own class for connection pooling. The library I'm using to talk to MariaDB can be found here. Within it is offered a connection pool that I'm leveraging in my database class which currently is as follows: import json import mariadb import pandas as pd
class DbConnPool: def __init__(self, user, password, host, port, pool_name, pool_size): self.user = user self.password = password self.host = host self.port = port self.pool_name = pool_name self.pool_size = pool_size
self.pool = self.create_conn_pool()
# Create connection pool def create_conn_pool(self): print('Opening connection pool...') pool = mariadb.ConnectionPool( user=self.user, password=self.password, host=self.host, port=self.port, pool_name=self.pool_name, pool_size=self.pool_size )
return pool
def get_conn(self): print('Getting connection from pool...') return self.pool.get_connection()
def close_conn(self, conn): print('Returning connection to pool...') conn.close()
def close_conn_pool(self): print('Closing connection pool...') self.pool.close()
def execute_sql(self, sql): print('Executing SQL...') print(sql)
conn = self.get_conn()
cur = conn.cursor() cur.execute(sql) data = cur.fetchall() # close cursor cur.close() # return connection to pool self.close_conn(conn)
return data def execute_sql_df(self, sql): print('Executing SQL to DataFrame...') print(sql) # debug
conn = self.get_conn() print('Reading SQL...') # debug # try: df = pd.read_sql(sql, conn) # except Exception as e: # print(e)
# return connection to pool self.close_conn(conn)
print('Returning DataFrame...') # debug return df
I manage the connection pool within index.py as follows: # read config with open('config.json', 'r') as config_file: config = json.load(config_file)
# open connection pool conn_pool = DbConnPool ( config['environment']['development']['database']['user'], config['environment']['development']['database']['password'], config['environment']['development']['database']['host'], config['environment']['development']['database']['port'], config['environment']['development']['database']['pool_name'], config['environment']['development']['database']['pool_size'] )
if __name__ == '__main__': try: print("dash_app running on port 8050") app.run_server(debug=True, host='0.0.0.0') # app.run_server(host='0.0.0.0') except KeyboardInterrupt as ki: conn_pool.close_conn_pool() Basically, the pool is opened and passed around to the various pages and update code. As SQL is passed to the class, connections are gotten and returned to the pool. Definitely a work in progress, however, functioning nicely so far as I transition the scripts. Thanks, Jason
|
|
|
Post by Jason on Nov 12, 2021 8:13:33 GMT -8
I've added pandas to several of the scripts so far to streamline the code. For example, indoorth.py uses for loops in order to build individual arrays to supply to traces added to the temperature and humidity graphs. Using a data frame eliminates the need for the looping.
My generateTHData() function is as follows (I cheated and hard coded the timeDelta in my SQL query so I need to fix for sure):
def generateTHData(conn_pool, timeDelta): try: now = datetime.datetime.now() before = now - timeDelta before = before.strftime('%Y-%m-%d %H:%M:%S') query = 'SELECT id, ChannelID, Temperature, Humidity, TimeStamp FROM `SkyWeather2`.`IndoorTHSensors` WHERE TimeStamp > DATE_SUB(CURRENT_TIMESTAMP, INTERVAL 7 DAY)' records = conn_pool.execute_sql_df(query)
return records except mdb.Error as e: traceback.print_exc() print("Error %d: %s" % (e.args[0], e.args[1]))
In my version, the top of the function IndoorTHPage() is as follows:
def IndoorTHPage(conn_pool): start = datetime.datetime.now()
time_delta = datetime.timedelta(days=7) print('Getting temperature/humidity data...') th_data = generateTHData(conn_pool, time_delta)
# get unique array of channels in data th_channels = th_data.ChannelID.unique()
# sort channels numerically th_channels.sort()
# check temperature units # database units are metric config_units = readJSON.getJSONValue("English_Metric") if not config_units: # configuration units are NOT metric # convert temperatures to imperial th_data.Temperature = CTUnits(th_data.Temperature)
t_graph = buildTGraph(th_data, th_channels) h_graph = buildHGraph(th_data, th_channels) My version of buildTemperatureGraph() called from buildTGraph():
def buildTemperatureGraph(data, channels): # Create figure with secondary y-axis fig = go.Figure() if (len(data) == 0): fig = go.Figure() fig.update_layout( height=800, title_text='No Indoor Temperature Data Available') return fig
fig = make_subplots(specs=[[{"secondary_y": True}]])
colors = ['red', 'blue', 'green', 'violet', 'brown', 'black', 'magenta', 'pink']
# min and max temperatures for bounding y-axis temp_min = data.Temperature.min() * 0.9 temp_max = data.Temperature.max() * 1.1
# add traces to figure for each channel for channel in channels: # filter DataFrame to specific channel channel_df = data[data.ChannelID == channel]
fig.add_trace( go.Scatter( x=channel_df.TimeStamp, y=channel_df.Temperature, name=CHANNEL_LABELS[channel - 1], line=dict( color=colors[channel - 1], width=1.5 ) ) )
# Add figure title fig.update_layout( title_text="Temperature", height=400 )
units = TUnits() # Set x-axis title fig.update_xaxes(title_text="Time")
# Set y-axes titles fig.update_yaxes( title_text="<b>Temperature (" + units + ")</b>", range=(temp_min, temp_max), secondary_y=False, side='left' )
return fig My version of buildHumidityGraph() called from buildHGraph():
def buildHumidityGraph(data, channels): # Create figure with secondary y-axis fig = go.Figure() if (len(data) == 0): fig = go.Figure() fig.update_layout( height=800, title_text='No Indoor Humidity Data Available') return fig
fig = make_subplots(specs=[[{"secondary_y": True}]])
colors = ['red', 'blue', 'green', 'violet', 'brown', 'black', 'magenta', 'pink']
# add traces to figure for each channel for channel in channels: channel_df = data[data.ChannelID == channel]
fig.add_trace( go.Scatter( x=channel_df.TimeStamp, y=channel_df.Humidity, name=CHANNEL_LABELS[channel - 1], line=dict( color=colors[channel - 1], width=1.5 ) ) )
# Add figure title fig.update_layout( title_text="Humidity", height=400 )
# units = "C" # Set x-axis title fig.update_xaxes(title_text="Time")
# Set y-axes titles
fig.update_yaxes(title_text="<b>Humidity (%)</b>", range = (0,100), secondary_y=False, side='left')
return fig
Thanks,
Jason
|
|
|
Post by Jason on Nov 12, 2021 8:21:31 GMT -8
What follows is a work in progress for the weather page. Here is my Jupyter notebook for handling the various rain totals utilizing pandas.
#%% sql='SELECT id, TotalRain, TimeStamp FROM SkyWeather2.WeatherData WHERE TimeStamp > DATE_SUB(CURRENT_TIMESTAMP, INTERVAL 32 DAY)'
#%% df = pd.read_sql(sql, conn, index_col='id', parse_dates='TimeStamp')
#%% parse date df['ts_year'] = df['TimeStamp'].dt.year df['ts_month'] = df['TimeStamp'].dt.month df['ts_day'] = df['TimeStamp'].dt.day
#%% Current day today = pd.Timestamp.now(tz='US/Eastern') today_df = df[(df.ts_day == today.day) & (df.ts_month == today.month)] today_df
#%% Current month month_df = df[df.ts_month == today.month] month_df
#%% Last 30 days td = pd.Timedelta(30, 'days') start_date = today.date() - td last_thirty_days_df = df[df.TimeStamp.dt.date >= start_date] last_thirty_days_df
#%% Last 24 hours td = pd.Timedelta(24, 'hours') start_ts = today - td last_day_df = df[df.TimeStamp >= pd.Timestamp(start_ts).tz_convert(None)] last_day_df
#%% Last 7 days td = pd.Timedelta(6, 'days') start_date = today.date() - td last_days_df = df[df.TimeStamp.dt.date >= start_date] last_days_df
accum = last_days_df['TotalRain'].iloc[-1] - last_days_df['TotalRain'].iloc[0]
Basically, I only do a single query to get the rain data as a data frame and then filter that data frame in order to calculate the various accumulations.
Thanks,
Jason
|
|
|
Post by SDL on Nov 12, 2021 11:17:06 GMT -8
Cleaner code all around. Of course, date frames still have internal loops, but they are probably faster than the python loops. I use Panda all over my book.
John
|
|