lsst.sims.maf.runComparison package

Submodules

lsst.sims.maf.runComparison.runComparison module

class lsst.sims.maf.runComparison.runComparison.RunComparison(baseDir, runlist, rundirs=None, defaultResultsDb='resultsDb_sqlite.db', verbose=False)[source]

Bases: object

Class to read multiple results databases, find requested summary metric comparisons, and stores results in DataFrames in class.

Set up the runs to compare and opens connections to all resultsDb_sqlite directories under baseDir/runlist[1-N] and their subdirectories. Expects a directory structure like: baseDir -> run1 -> subdirectory1 (e.g. ‘scheduler’, containing a resultsDb_sqlite.db file) ……………. -> subdirectoryN ……. -> runN -> subdirectoryX

Parameters:
  • baseDir (str) – The root directory containing all of the underlying runs and their subdirectories.
  • runlist (list) – A list of runs to compare.
  • rundirs (list) – A list of directories (relative to baseDir) where the runs in runlist reside. Optional - if not provided, assumes directories are simply the names in runlist. Must have same length as runlist (note that runlist can contain duplicate entries).
addSummaryStats(metricDict)[source]

Combine the summary statistics of a set of metrics into a pandas dataframe that is indexed by the opsim run name.

Parameters:metricDict (dict) – A dictionary of metrics with all of the information needed to query a results database. The metric/metadata/slicer/summary values referred to by a metricDict value could be unique but don’t have to be.
Returns:A pandas dataframe containing a column for each of the configuration parameters given in paramName like and a column for each of the dictionary keys in the metricDict. The resulting dataframe is indexed the name of the opsim runs.
index metric1 metric2

<run_123> <metricValue1> <metricValue2> <run_124> <metricValue1> <metricValue2>

Return type:pandas DataFrame
buildMetricDict(metricNameLike=None, metricMetadataLike=None, slicerNameLike=None, subdir=None)[source]

Return a metric dictionary based on finding all metrics which match ‘like’ the various parameters.

Parameters:
  • metricNameLike (str, opt) – Metric name like this – i.e. will look for metrics which match metricName like “value”.
  • metricMetadataLike (str, opt) – Metric Metadata like this.
  • slicerNameLike (str, opt) – Slicer name like this.
  • subdir (str, opt) – Find metrics from this subdir only. If other parameters are not specified, this returns all metrics within this subdir.
Returns:

Key = self-created metric ‘name’, value = Dict{metricName, metricMetadata, slicerName}

Return type:

Dict

close()[source]

Close all connections to the results database files.

filterCols(summaryName)[source]

Return a dataframe containing only stats which match summaryName.

Parameters:summaryName (str) – The type of summary stat to match. (i.e. Max, Mean)
Returns:
Return type:pd.DataFrame
findChanges(threshold=0.05)[source]

Return a dataframe containing only values which changed by threshhold.

Parameters:threshold (float, opt) – Identify values which change by more than threshold (%) in the normalized values. Default 5% (0.05).
Returns:
Return type:pd.DataFrame
generateDiffHtml(normalized=False, html_out=None, show_page=False, combined=False, fullStats=False)[source]

Use bokeh to convert a summaryStats dataframe to interactive html table.

Parameters:
  • normalized (bool, opt) – If True generate html table with normalizedStats
  • html_out (str, opt) – Name of the html that will be output and saved. If no string is provided then the html table will not be saved.
  • show_page (bool, opt) – If True the html page generate by this function will automatically open in your browser
  • combined (bool, opt) – If True the html produce will have columns for the original summaryStats values, as well as their normalized values. The baselineRun used to calculate the normalized values will be dropped from the table.
  • fullStats (bool, opt) – If False the final html table will not include summaryStats that contain ‘3Sigma’,’Rms’,’Min’,’Max’,’RobustRms’, or ‘%ile’ in their names.
getFileNames(metricName, metricMetadata=None, slicerName=None)[source]

For each of the runs in runlist, get the paths to the datafiles for a given metric.

Parameters:
  • metricName (str) – The name of the original metric.
  • metricMetadata (str, opt) – The metric metadata specifying the metric desired (optional).
  • slicerName (str, opt) – The slicer name specifying the metric desired (optional).
Returns:

Keys: runName, Value: path to file

Return type:

Dict

normalizeStats(baselineRun)[source]

Normalize the summary metric values in the dataframe resulting from combineSummaryStats based on the values of a single baseline run.

Parameters:
  • baselineRun (str) – The name of the opsim run that will serve as baseline.
  • Results
  • -------
  • DataFrame (pandas) – A pandas dataframe containing a column for each of the configuration parameters given in paramNamelike and a column for each of the dictionary keys in the metricDict. The resulting dataframe is indexed the name of the opsim runs. index metric1 metric2 <run_123> <norm_metricValue1> <norm_metricValue2> <run_124> <norm_metricValue1> <norm_metricValue2>
  • Notes
  • ------
  • metric values are normalized in the following way (The) –
  • = metric_value(run) - metric_value(baselineRun) / metric_value(baselineRun) (norm_metric_value(run)) –
plotMetricData(bundleDict, plotFunc, runlist=None, userPlotDict=None, layout=None, outDir=None, paramTitles=False, paramCols=None, savefig=False)[source]
plotSummaryStats()[source]
readMetricData(metricName, metricMetadata, slicerName)[source]
sortCols(baseName=True, summaryName=True)[source]

Return the columns (in order) to display a sorted version of the stats dataframe.

Parameters:
  • baseName (bool, opt) – Sort by the baseName. Default True. If True, this takes priority in the sorted results.
  • summaryName (bool, opt) – Sort by the summary stat name (summaryName). Default True.
Returns:

Return type:

list

variedParameters(paramNameLike=None, dbDir=None)[source]

Query the opsim configuration table for a set of user defined configuration parameters for a set of runs.

Parameters:
  • paramNameLike (list, opt) – A list of of opsim configuration parameters to pull out of the configuration table.
  • Results
  • -------
  • DataFrame (pandas) – A pandas dataframe containing a column for each of the configuration parameters given in paramName like. The resulting dataframe is indexed the name of the opsim runs. runName parameter1 parameter2 <run_123> <parameterValues1> <parameterValues1>

Notes

This method will use the sqlite ‘like’ function to query the configuration table. Below is an example of how the items in paramNameLike need to be formatted: [“%WideFastDeep%hour_angle_bonus%”, “%WideFastDeep%airmass_bonus%”].

lsst.sims.maf.runComparison.summaryStatPlotters module

lsst.sims.maf.runComparison.summaryStatPlotters.plotSummaryStats(self, output=None, totalVisits=True)[source]

Plot the normalized metric values as a function of opsim run.

output: str, opt
Name of figure to save to disk. If this is left as None the figure is not saved.
totalVisits: bool
If True the total number of visits is included in the metrics plotted. When comparing runs a very different lengths it is recommended to set this flag to False.

Module contents