EncodingsΒΆ
The key to creating meaningful visualizations is to map properties of the data
to visual properties in order to effectively communicate information.
In Altair, this mapping of visual properties to data columns is referred to
as an encoding, and is most often expressed through the Chart.encode()
method.
For example, here we will visualize the cars dataset using four of the available
encodings: x
(the x-axis value), y
(the y-axis value),
color
(the color of the marker), and shape
(the shape of the point marker):
import altair as alt
from vega_datasets import data
cars = data.cars()
alt.Chart(cars).mark_point().encode(
x='Horsepower',
y='Miles_per_Gallon',
color='Origin',
shape='Origin'
)
For data specified as a DataFrame, Altair can automatically determine the correct data type for each encoding, and creates appropriate scales and legends to represent the data.
Encoding ChannelsΒΆ
Altair provides a number of encoding channels that can be useful in different circumstances; the following table summarizes them:
Position Channels:
Channel | Altair Class | Description | Example |
---|---|---|---|
x | X |
The x-axis value | Simple Scatter Plot |
y | Y |
The y-axis value | Simple Scatter Plot |
x2 | X2 |
Second x value for ranges | Error Bars showing Confidence Interval |
y2 | Y2 |
Second y value for ranges | Line chart with Confidence Interval Band |
longitude | Longitude |
Longitude for geo charts | Locations of US Airports |
latitude | Latitude |
Latitude for geo charts | Locations of US Airports |
longitude2 | Longitude2 |
Second longitude value for ranges | N/A |
latitude2 | Latitude2 |
Second latitude value for ranges | N/A |
Mark Property Channels:
Channel | Altair Class | Description | Example |
---|---|---|---|
color | Color |
The color of the mark | Simple Heatmap |
fill | Fill |
The fill for the mark | N/A |
opacity | Opacity |
The opacity of the mark | Horizon Graph |
shape | Shape |
The shape of the mark | N/A |
size | Size |
The size of the mark | Table Bubble Plot (Github Punch Card) |
stroke | Stroke |
The stroke of the mark | N/A |
Text and Tooltip Channels:
Channel | Altair Class | Description | Example |
---|---|---|---|
text | Text |
Text to use for the mark | Simple Scatter Plot with Labels |
key | Key |
β | N/A |
tooltip | Tooltip |
The tooltip value | Scatter Plot with Tooltips |
Hyperlink Channel:
Channel | Altair Class | Description | Example |
---|---|---|---|
href | Href |
Hyperlink for points | N/A |
Level of Detail Channel:
Channel | Altair Class | Description | Example |
---|---|---|---|
detail | Detail |
Additional property to group by | Selection Detail Example |
Order Channel:
Channel | Altair Class | Description | Example |
---|---|---|---|
order | Order |
Sets the order of the marks | Connected Scatterplot (Lines with Custom Paths) |
Facet Channels:
Channel | Altair Class | Description | Example |
---|---|---|---|
column | Column |
The column of a faceted plot | Trellis Scatter Plot |
row | Row |
The row of a faceted plot | Beckerβs Barley Trellis Plot |
Data TypesΒΆ
The details of any mapping depend on the type of the data. Altair recognizes four main data types:
Data Type | Shorthand Code | Description |
---|---|---|
quantitative | Q |
a continuous real-valued quantity |
ordinal | O |
a discrete ordered quantity |
nominal | N |
a discrete unordered category |
temporal | T |
a time or date value |
If types are not specified for data input as a DataFrame, Altair defaults to
quantitative
for any numeric data, temporal
for date/time data, and
nominal
for string data, but be aware that these defaults are by no means
always the correct choice!
The types can either be expressed in a long-form using the channel encoding
classes such as X
and Y
, or in short-form using the
Shorthand Syntax discussed below.
For example, the following two methods of specifying the type will lead to
identical plots:
alt.Chart(cars).mark_point().encode(
x='Acceleration:Q',
y='Miles_per_Gallon:Q',
color='Origin:N'
)
alt.Chart(cars).mark_point().encode(
alt.X('Acceleration', type='quantitative'),
alt.Y('Miles_per_Gallon', type='quantitative'),
alt.Color('Origin', type='nominal')
)
The shorthand form, x="name:Q"
, is useful for its lack of boilerplate
when doing quick data explorations. The long-form,
alt.X('name', type='quantitative')
, is useful when doing more fine-tuned
adjustments to the encoding, such as binning, axis and scale properties,
or more.
Specifying the correct type for your data is important, as it affects the way Altair represents your encoding in the resulting plot.
Effect of Data Type on Color ScalesΒΆ
As an example of this, here we will represent the same data three different ways, with the color encoded as a quantitative, ordinal, and nominal type, using three vertically-concatenated charts (see Vertical Concatenation):
base = alt.Chart(cars).mark_point().encode(
x='Horsepower:Q',
y='Miles_per_Gallon:Q',
).properties(
width=150,
height=150
)
alt.vconcat(
base.encode(color='Cylinders:Q').properties(title='quantitative'),
base.encode(color='Cylinders:O').properties(title='ordinal'),
base.encode(color='Cylinders:N').properties(title='nominal'),
)
The type specification influences the way Altair, via Vega-Lite, decides on the color scale to represent the value, and influences whether a discrete or continuous legend is used.
Effect of Data Type on Axis ScalesΒΆ
Similarly, for x and y axis encodings, the type used for the data will affect
the scales used and the characteristics of the mark. For example, here is the
difference between a quantitative
and ordinal
scale for an column
that contains integers specifying a year:
pop = data.population.url
base = alt.Chart(pop).mark_bar().encode(
alt.Y('mean(people):Q', axis=alt.Axis(title='total population'))
).properties(
width=200,
height=200
)
alt.hconcat(
base.encode(x='year:Q').properties(title='year=quantitative'),
base.encode(x='year:O').properties(title='year=ordinal')
)
In altair, quantitative scales always start at zero unless otherwise specified, while ordinal scales are limited to the values within the data.
Overriding the behavior of including zero in the axis, we see that even then the precise appearance of the marks representing the data are affected by the data type:
base.encode(
alt.X('year:Q',
scale=alt.Scale(zero=False)
)
)
Because quantitative values do not have an inherent width, the bars do not fill the entire space between the values. This view also makes clear the missing year of data that was not immediately apparent when we treated the years as categories.
This kind of behavior is sometimes surprising to new users, but it emphasizes the importance of thinking carefully about your data types when visualizing data: a visual encoding that is suitable for categorical data may not be suitable for quantitative data, and vice versa.
Encoding Channel OptionsΒΆ
Each encoding channel allows for a number of additional options to be expressed; these can control things like axis properties, scale properties, headers and titles, binning parameters, aggregation, sorting, and many more.
The particular options that are available vary by encoding type; the various options are listed below.
The X
and Y
encodings accept the following options:
Property | Type | Description |
---|---|---|
aggregate | Aggregate |
Aggregation function for the field
(e.g., Default value: |
axis | anyOf(Axis , null ) |
An object defining properties of axisβs gridlines, ticks and labels.
If Default value: If undefined, default axis properties are applied. |
bin | anyOf(boolean , BinParams ) |
A flag for binning a Default value: |
field | anyOf(string , RepeatRef ) |
Required. A string defining the name of the field from which to pull a data value
or an object defining iterated values from the Note: Dots ( Note: |
scale | anyOf(Scale , null ) |
An object defining properties of the channelβs scale, which is the function that transforms values in the data domain (numbers, dates, strings, etc) to visual values (pixels, colors, sizes) of the encoding channels. If Default value: If undefined, default scale properties are applied. |
sort | Sort |
Sort order for the encoded field. For continuous fields (quantitative or temporal), For discrete fields,
Default value: Note: |
stack | anyOf(StackOffset , null ) |
Type of stacking offset if the field should be stacked.
Default value: |
timeUnit | TimeUnit |
Time unit (e.g., Default value: |
title | [string, null] | A title for the field. If Default value: derived from the fieldβs name and transformation function ( Notes:
|
type | Type |
The encoded fieldβs type of measurement ("quantitative" , "temporal" , "ordinal" , or "nominal" ).
It can also be a "geojson" type for encoding βgeoshapeβ. |
The Color
, Fill
, Opacity
, Shape
,
Size
, and Stroke
encodings accept the following options:
Property | Type | Description |
---|---|---|
aggregate | Aggregate |
Aggregation function for the field
(e.g., Default value: |
bin | anyOf(boolean , BinParams ) |
A flag for binning a Default value: |
condition | anyOf(ConditionalValueDef , array(ConditionalValueDef )) |
One or more value definition(s) with a selection predicate. Note: A field definitionβs |
field | anyOf(string , RepeatRef ) |
Required. A string defining the name of the field from which to pull a data value
or an object defining iterated values from the Note: Dots ( Note: |
legend | anyOf(Legend , null ) |
An object defining properties of the legend.
If Default value: If undefined, default legend properties are applied. |
scale | anyOf(Scale , null ) |
An object defining properties of the channelβs scale, which is the function that transforms values in the data domain (numbers, dates, strings, etc) to visual values (pixels, colors, sizes) of the encoding channels. If Default value: If undefined, default scale properties are applied. |
sort | Sort |
Sort order for the encoded field. For continuous fields (quantitative or temporal), For discrete fields,
Default value: Note: |
timeUnit | TimeUnit |
Time unit (e.g., Default value: |
title | [string, null] | A title for the field. If Default value: derived from the fieldβs name and transformation function ( Notes:
|
type | Type |
The encoded fieldβs type of measurement ("quantitative" , "temporal" , "ordinal" , or "nominal" ).
It can also be a "geojson" type for encoding βgeoshapeβ. |
The Row
and Column
encodings accept the following options:
Property | Type | Description |
---|---|---|
aggregate | Aggregate |
Aggregation function for the field
(e.g., Default value: |
bin | anyOf(boolean , BinParams ) |
A flag for binning a Default value: |
field | anyOf(string , RepeatRef ) |
Required. A string defining the name of the field from which to pull a data value
or an object defining iterated values from the Note: Dots ( Note: |
header | Header |
An object defining properties of a facetβs header. |
sort | Sort |
Sort order for the encoded field. For continuous fields (quantitative or temporal), For discrete fields,
Default value: Note: |
timeUnit | TimeUnit |
Time unit (e.g., Default value: |
title | [string, null] | A title for the field. If Default value: derived from the fieldβs name and transformation function ( Notes:
|
type | Type |
The encoded fieldβs type of measurement ("quantitative" , "temporal" , "ordinal" , or "nominal" ).
It can also be a "geojson" type for encoding βgeoshapeβ. |
The Text
and Tooltip
encodings accept the following options:
Property | Type | Description |
---|---|---|
aggregate | Aggregate |
Aggregation function for the field
(e.g., Default value: |
bin | anyOf(boolean , BinParams ) |
A flag for binning a Default value: |
condition | anyOf(ConditionalValueDef , array(ConditionalValueDef )) |
One or more value definition(s) with a selection predicate. Note: A field definitionβs |
field | anyOf(string , RepeatRef ) |
Required. A string defining the name of the field from which to pull a data value
or an object defining iterated values from the Note: Dots ( Note: |
format | string |
The formatting pattern for a text field. If not defined, this will be determined automatically. |
timeUnit | TimeUnit |
Time unit (e.g., Default value: |
title | [string, null] | A title for the field. If Default value: derived from the fieldβs name and transformation function ( Notes:
|
type | Type |
The encoded fieldβs type of measurement ("quantitative" , "temporal" , "ordinal" , or "nominal" ).
It can also be a "geojson" type for encoding βgeoshapeβ. |
The Detail
, Key
, Latitude
, Latitude2
,
Longitude
, Longitude2
, X2
and Y2
encodings accept the following options:
Property | Type | Description |
---|---|---|
aggregate | Aggregate |
Aggregation function for the field
(e.g., Default value: |
bin | anyOf(boolean , BinParams ) |
A flag for binning a Default value: |
field | anyOf(string , RepeatRef ) |
Required. A string defining the name of the field from which to pull a data value
or an object defining iterated values from the Note: Dots ( Note: |
timeUnit | TimeUnit |
Time unit (e.g., Default value: |
title | [string, null] | A title for the field. If Default value: derived from the fieldβs name and transformation function ( Notes:
|
type | Type |
The encoded fieldβs type of measurement ("quantitative" , "temporal" , "ordinal" , or "nominal" ).
It can also be a "geojson" type for encoding βgeoshapeβ. |
The Href
encoding accepts the following options:
Property | Type | Description |
---|---|---|
aggregate | Aggregate |
Aggregation function for the field
(e.g., Default value: |
bin | anyOf(boolean , BinParams ) |
A flag for binning a Default value: |
condition | anyOf(ConditionalValueDef , array(ConditionalValueDef )) |
One or more value definition(s) with a selection predicate. Note: A field definitionβs |
field | anyOf(string , RepeatRef ) |
Required. A string defining the name of the field from which to pull a data value
or an object defining iterated values from the Note: Dots ( Note: |
timeUnit | TimeUnit |
Time unit (e.g., Default value: |
title | [string, null] | A title for the field. If Default value: derived from the fieldβs name and transformation function ( Notes:
|
type | Type |
The encoded fieldβs type of measurement ("quantitative" , "temporal" , "ordinal" , or "nominal" ).
It can also be a "geojson" type for encoding βgeoshapeβ. |
The Order
encoding accepts the following options:
Property | Type | Description |
---|---|---|
aggregate | Aggregate |
Aggregation function for the field
(e.g., Default value: |
bin | anyOf(boolean , BinParams ) |
A flag for binning a Default value: |
field | anyOf(string , RepeatRef ) |
Required. A string defining the name of the field from which to pull a data value
or an object defining iterated values from the Note: Dots ( Note: |
sort | SortOrder |
The sort order. One of "ascending" (default) or "descending" . |
timeUnit | TimeUnit |
Time unit (e.g., Default value: |
title | [string, null] | A title for the field. If Default value: derived from the fieldβs name and transformation function ( Notes:
|
type | Type |
The encoded fieldβs type of measurement ("quantitative" , "temporal" , "ordinal" , or "nominal" ).
It can also be a "geojson" type for encoding βgeoshapeβ. |
Binning and AggregationΒΆ
Beyond simple channel encodings, Altairβs visualizations are built on the concept of the database-style grouping and aggregation; that is, the split-apply-combine abstraction that underpins many data analysis approaches.
For example, building a histogram from a one-dimensional dataset involves splitting data based on the bin it falls in, aggregating the results within each bin using a count of the data, and then combining the results into a final figure.
In Altair, such an operation looks like this:
alt.Chart(cars).mark_bar().encode(
alt.X('Horsepower', bin=True),
y='count()'
# could also use alt.Y(aggregate='count', type='quantitative')
)
Notice here we use the shorthand version of expressing an encoding channel
(see Encoding Shorthands) with the count
aggregation,
which is the one aggregation that does not require a field to be
specified.
Similarly, we can create a two-dimensional histogram using, for example, the size of points to indicate counts within the grid (sometimes called a βBubble Plotβ):
alt.Chart(cars).mark_point().encode(
alt.X('Horsepower', bin=True),
alt.Y('Miles_per_Gallon', bin=True),
size='count()',
)
There is no need, however, to limit aggregations to counts alone. For example, we could similarly create a plot where the color of each point represents the mean of a third quantity, such as acceleration:
alt.Chart(cars).mark_circle().encode(
alt.X('Horsepower', bin=True),
alt.Y('Miles_per_Gallon', bin=True),
size='count()',
color='average(Acceleration):Q'
)
In addition to count
and average
, there are a large number of available
aggregation functions built into Altair; they are listed in the following table:
Aggregate | Description | Example |
---|---|---|
argmin | An input data object containing the minimum field value. | N/A |
argmax | An input data object containing the maximum field value. | N/A |
average | The mean (average) field value. Identical to mean. | Line Chart with Layered Aggregates |
count | The total count of data objects in the group. | Simple Heatmap |
distinct | The count of distinct field values. | N/A |
max | The maximum field value. | Box Plot with Min/Max Whiskers |
mean | The mean (average) field value. | Layered Plot with Dual-Axis |
median | The median field value | Box Plot with Min/Max Whiskers |
min | The minimum field value. | Box Plot with Min/Max Whiskers |
missing | The count of null or undefined field values. | N/A |
q1 | The lower quartile boundary of values. | Box Plot with Min/Max Whiskers |
q3 | The upper quartile boundary of values. | Box Plot with Min/Max Whiskers |
ci0 | The lower boundary of the bootstrapped 95% confidence interval of the mean. | Error Bars showing Confidence Interval |
ci1 | The upper boundary of the bootstrapped 95% confidence interval of the mean. | Error Bars showing Confidence Interval |
stderr | The standard error of the field values. | N/A |
stdev | The sample standard deviation of field values. | N/A |
stdevp | The population standard deviation of field values. | N/A |
sum | The sum of field values. | Streamgraph |
valid | The count of field values that are not null or undefined. | N/A |
values | ?? | N/A |
variance | The sample variance of field values. | N/A |
variancep | The population variance of field values. | N/A |
Encoding ShorthandsΒΆ
For convenience, Altair allows the specification of the variable name along with the aggregate and type within a simple shorthand string syntax. This makes use of the type shorthand codes listed in Data Types as well as the aggregate names listed in Binning and Aggregation. The following table shows examples of the shorthand specification alongside the long-form equivalent:
Shorthand | Equivalent long-form |
---|---|
x='name' |
alt.X('name') |
x='name:Q' |
alt.X('name', type='quantitative') |
x='sum(name)' |
alt.X('name', aggregate='sum') |
x='sum(name):Q' |
alt.X('name', aggregate='sum', type='quantitative') |
x='count():Q' |
alt.X(aggregate='count', type='quantitative') |
Ordering marksΒΆ
The order option and Order
channel can sort how marks are drawn on the chart.
For stacked marks, this controls the order of components of the stack. Here, the elements of each bar are sorted alphabetically by the name of the nominal data in the color channel.
import altair as alt
from vega_datasets import data
barley = data.barley()
alt.Chart(barley).mark_bar().encode(
x='variety:N',
y='sum(yield):Q',
color='site:N',
order=alt.Order("site", sort="ascending")
)
The order can be reversed by changing the sort option to descending.
import altair as alt
from vega_datasets import data
barley = data.barley()
alt.Chart(barley).mark_bar().encode(
x='variety:N',
y='sum(yield):Q',
color='site:N',
order=alt.Order("site", sort="descending")
)
The same approach works for other mark types, like stacked areas charts.
import altair as alt
from vega_datasets import data
barley = data.barley()
alt.Chart(barley).mark_area().encode(
x='variety:N',
y='sum(yield):Q',
color='site:N',
order=alt.Order("site", sort="ascending")
)
For line marks, the order channel encodes the order in which data points are connected. This can be useful for creating a scatterplot that draws lines between the dots using a different field than the x and y axes.
import altair as alt
from vega_datasets import data
driving = data.driving()
alt.Chart(driving).mark_line(point=True).encode(
alt.X('miles', scale=alt.Scale(zero=False)),
alt.Y('gas', scale=alt.Scale(zero=False)),
order='year'
)