arviz_plots.plot_rank_dist

Contents

arviz_plots.plot_rank_dist#

arviz_plots.plot_rank_dist(dt, var_names=None, filter_vars=None, group='posterior', coords=None, sample_dims=None, compact=True, combined=False, kind=None, ci_prob=None, plot_collection=None, backend=None, labeller=None, aes_by_visuals=None, visuals=None, stats=None, **pc_kwargs)[source]#

Plot 1D marginal distributions and fractional rank Δ-ECDF plots.

Rank plots are built by replacing the posterior draws by their ranking computed over all chains. Then each chain is plotted independently. If all of the chains are targeting the same posterior, we expect the ranks in each chain to be uniformly distributed. To simplify comparison we compute the ordered fractional ranks, which are distributed uniformly in [0, 1]. Additionally, we plot the Δ-ECDF, that is, the difference between the expected CDF from the observed ECDF. Simultaneous confidence bands are computed using simulation method described in [1].

Parameters:
dtxarray.DataTree

Input data

var_namesstr or list of str, optional

One or more variables to be plotted. Prefix the variables by ~ when you want to exclude them from the plot.

filter_vars{None, “like”, “regex”}, optional, default=None

If None (default), interpret var_names as the real variables names. If “like”, interpret var_names as substrings of the real variables names. If “regex”, interpret var_names as regular expressions on the real variables names.

groupstr, default “posterior”

Group to be plotted.

sample_dimsstr or sequence of hashable, optional

Dimensions to reduce unless mapped to an aesthetic. Defaults to rcParams["data.sample_dims"]

compactbool, default True

Plot multidimensional variables in a single plot.

combinedbool, default False

Whether to plot intervals for each chain or not. Ignored when the “chain” dimension is not present.

kind{“kde”, “hist”, “dot”, “ecdf”}, optional

How to represent the marginal density. Defaults to rcParams["plot.density_kind"]

ci_probfloat, optional

Indicates the probability that should be contained within the plotted credible interval for the fractional ranks. Defaults to rcParams["stats.ci_prob"]

plot_collectionPlotCollection, optional
backend{“matplotlib”, “bokeh”}, optional
labellerlabeller, optional
aes_by_visualsmapping, optional

Mapping of visuals to aesthetics that should use their mapping in plot_collection when plotted. The defaults depend on the combination of compact and combined, see the examples section for an illustrated description. Valid keys are the same as for visuals.

visualsmapping of {strmapping or False}, optional

Valid keys are:

  • dist -> depending on the value of kind passed to:

    • “kde” -> passed to line_xy

    • “ecdf” -> passed to ecdf_line

    • “hist” -> passed to :func: hist

  • “rank” -> passed to ecdf_line

  • “label” -> labelled_x and labelled_y

  • “ticklabels” -> ticklabel_props

  • “xlabel_rank” -> labelled_x

  • remove_axis -> not passed anywhere, can only be False to skip calling this function

statsmapping, optional

Valid keys are:

  • dist -> passed to kde, ecdf, …

  • ecdf_pit -> passed to ecdf_pit. Default is {"n_simulation": 1000}.

**pc_kwargs

Passed to arviz_plots.PlotCollection.grid

Returns:
PlotCollection

References

[1]

Säilynoja et al. Graphical test for discrete uniformity and its applications in goodness-of-fit evaluation and multiple sample comparison. Statistics and Computing 32(32). (2022) https://doi.org/10.1007/s11222-022-10090-6

Examples

The following examples focus on behaviour specific to plot_rank_dist. For a general introduction to batteries-included functions like this one and common usage examples see Introduction to batteries-included plots

Default plot_rank_dist (compact=True and combined=False). In this case, the multiple coordinate values are overlaid on the same plot for multidimensional values; by default, the color is mapped to all dimensions of each variable (but sample_dims) to allow distinguishing the different coordinate values.

As combined=False each chain is also being plotted, overlaying them on their corresponding plots; as the color property is already taken, the chain information is encoded in the linestyle as default.

Both mappings are applied to the rank and dist elements.

>>> from arviz_plots import plot_rank_dist, style
>>> style.use("arviz-variat")
>>> from arviz_base import load_arviz_data
>>> centered = load_arviz_data('centered_eight')
>>> coords = {"school": ["Choate", "Deerfield", "Hotchkiss"]}
>>> pc = plot_rank_dist(centered, coords=coords, compact=True, combined=False)
>>> pc.add_legend(["__variable__", "school"])
../../_images/arviz_plots-plot_rank_dist-1.png

plot_rank_dist with compact=True and combined=True. The aesthetic mappings stay the same as in the previous case, but now the linestyle property mapping is only taken into account for the rank as in the left column, we use the data from all chains to generate a single distribution representation for each variable+coordinate value combination.

Similarly to the first case, this default and now only mapping is applied to both the rank and the dist elements.

>>> pc = plot_rank_dist(centered, coords=coords, compact=True, combined=True)
>>> pc.add_legend(["__variable__", "school"])
../../_images/arviz_plots-plot_rank_dist-2.png

When compact=False, each variable and coordinate value gets its own plot, and so the color property is no longer used to encode this information. Instead, it is now used to encode the chain information.

>>> pc = plot_rank_dist(centered, coords=coords, compact=False, combined=False)
../../_images/arviz_plots-plot_rank_dist-3.png

Similarly to the other combined=True case, the aesthetics stay the same as with combined=False, but they are ignored by default when plotting on the left column.

>>> pc = plot_rank_dist(centered, coords=coords, compact=False, combined=True)
>>> pc.add_legend("chain")
../../_images/arviz_plots-plot_rank_dist-4.png