datenguidepy package

Submodules

datenguidepy.output_transformer module

class datenguidepy.output_transformer.QueryOutputTransformer(query_response)[source]

Bases: object

Transforms the query results into a DataFrame.

Parameters

query_response (List[ExecutionResults]) – Accepts the return type of the query executioner in case a non None value was return. This is a list of ExecutionResults as some python querys may internally be converted into several GraphQL queries to be executed, returnning one result each.

transform(verbose_statistic_names=False, verbose_enum_values=False, add_units=False, remove_duplicates=False)[source]

Transform the queries results into a Pandas DataFrame.

This function allows for different flags that make the results more readable by using meta information about the query. By default the dataframe is not enrichted by meta information assuming an experienced user familiar with a particular statistic. For data exploration it is recommended to turn on one or more flags.

Parameters
  • verbose_statistic_names (bool) – Toggles statistic codes to short descriptions.

  • verbose_enum_values (bool) – Toggles enum codes to descriptions if enum columns are present.

  • add_units (bool) – Toggles the addition of a unit column for each statistic to make it easier to interpret the numbers.

  • remove_duplicates (bool) – Removes duplicates from query results, i.e. if the exact same number has been reported for the same statistic, year, region etc. from the same source it gets removed. Such duplications are sometimes caused on the API side and this is convenience functionality to remove them. The removal happens before potentially joining several different statistics.

Return type

DataFrame

Returns

Returns a pandas DataFrame of the queries results.

datenguidepy.query_builder module

class datenguidepy.query_builder.Field(name, fields=[], args={}, parent_field=None, default_fields=True, return_type=None, stat_meta_data_provider=None)[source]

Bases: object

A field of a query that specifies a statistic (or another information, e.g. source) to query. The name of the field (mostly statistic), the filters (specified with args) and the desired output information (fields) are specified.

Parameters
  • name (str) – Name of Field (statistic)

  • fields (list, optional) – desired output fields (e.g. year or NAT), defaults to []

  • args (Dict[str, Any]) – Filters for the desired field (e.g. {‘year’: 2017}).

If “ALL” is passed as a value, then results are returned for all possible subgroups. (e.g. for gender ‘GES’: ‘ALL’ three data entries are returned - for male, female and summed for both. if the filter is not set, then only the summed result is returned. Except for year: this is by default returned for each year), defaults to {} :type args: dict, optional :type parent_field: Optional[Field] :param parent_field: The field this field is attached to, defaults to None :type parent_field: Field, optional :type default_fields: bool :param default_fields: Wether default fields should be attached or not, defaults to True :type default_fields: bool, optional :type return_type: Optional[str] :param return_type: The graphQL return type of this field, defaults to None :type return_type: str, optional

add_args(args)[source]

Add arguments to the field. :type args: dict :param args: Arguments to be added. :type args: dict

add_field(field, default_fields=None)[source]

Add a subfield to the field.

Raises

TypeError – If the added field is neither of type String nor Field.

Returns

the added field

Return type

Field

arguments_info()[source]

Get information on possible arguments for field. The name of the argument is followed by the kind and name of the input type for the argument in brackets. If the argument is a list, the kind and name of the list elements are included in the brackets as well.

Returns

Possible arguments for the field as string and their input types.

Return type

Optional[str]

description()[source]

Get description of field.

Returns

Description of the field as string.

Return type

Optional[str]

drop_field(field)[source]

Drop an attached subfield of the field.

Parameters

field (str) – The name of the field to be droped.

Returns

The field without the subfield.

Return type

Field

enum_info()[source]

Get information on possible enum vaules for field.

Returns

Possible enum values for the field as string.

Return type

Optional[str]

fields_info()[source]

Get information on possible fields for field.

Returns

Possible fields for the field as string

Return type

Optional[str]

get_fields()[source]

Get all fields that are attached to this field or subfields of this field.

Returns

a list of all fields

Return type

List[str]

get_info()[source]

Prints summarized information on a field’s meta data.

Returns

None

Return type

None

class datenguidepy.query_builder.Query(start_field, region_field=None, default_fields=True, stat_meta_data_provider=None)[source]

Bases: object

A query to get information via the datenguide API for regionalstatistik. The query contains all fields and arguments.

Parameters
  • start_field (Field) – The top node field; either allRegions or Region.

  • region_field (Optional[Field]) – A field of type ‘Region’ that is needed

if start_field is allRegions, defaults to None :type region_field: Field, optional :type default_fields: bool :param default_fields: Wether default fields shall

be attached to the fields., defaults to True

Raises

RuntimeError – [description]

add_field(field, default_fields=None)[source]

Add a field to the query.

Parameters
  • field (Union[str, Field]) – Field to be added

  • default_fields (Optional[bool]) – Wether default fields

should be attached or not, defaults to None :type default_fields: bool, optional :raises RuntimeError: If the allRegions Query has

no regions field a subfield can be attached to.

Returns

The added field.

Return type

Field

classmethod all_regions(fields=[], parent=None, nuts=None, lau=None, default_fields=True, stat_meta_data_provider=None)[source]

Factory method to instantiate a Query with allRegions start field. A parent id, nuts or lau can be further specified for the query.

Parameters
  • fields (list) – all fields that shall be returned for that region. Can either be simple fields (e.g. name) or fields with nested fields.

  • parent (Optional[str]) – The region id of the parent region the statistics shall be queried for. (E.g. the id for a state where all sub regions within the state shall be queried for.)

  • nuts (int, optional) – The administration level: 1 – Bundesländer 2 – Regierungsbezirke / statistische Regionen 3 – Kreise / kreisfreie Städte. Default None returns results for all levels.

  • lau (int, optional) – The administration level: 1 - Verwaltungsgemeinschaften 2 - Gemeinden. Default returns results for all levels.

  • default_fields (bool) – Wether default fields shall be attached to the fields.

Returns

A query object with allRegions as start Field.

Return type

Query

drop_field(field)[source]

Drop an attached field of the query.

Parameters

field (str) – The name of the field to be droped

Raises

RuntimeError – Raises Error if Query is

initialized without regions field. :return: the query without the dropped field :rtype: Query

get_fields()[source]

Get all fields of a query.

Returns

a list field names

Return type

List[str]

get_graphql_query()[source]

Formats the Query into a String that can be queried from the Datenguide API.

Returns

the Query formatted for the GraphQL API as a List of query strings

Return type

List[str]

get_info(field=None)[source]

Get information on a specific field. If field is not specified return meta data for all statistics that can be queried.

Parameters

field (str, optional) – the field to get information on. If None, then information on all possible fields of a query are returned, defaults to None

Returns

Response from QueryExecutioner on meta data info

Return type

Optional[TypeMetaData]

meta_data()[source]

Runs the query and returns a Dict with the meta data of the queries results.

Raises

RuntimeError – If the Query did not return any results.

E.g. if the Query was ill-formed. :return: A Dict with the queried meta data.

If the query fails raise RuntimeError.

Return type

Union[Dict[str, Any], List[Dict[str, Any]]]

classmethod region(region, fields=[], default_fields=True, stat_meta_data_provider=None)[source]
Factory method to instantiate a Query with a single region through

its region id.

Parameters
  • region (Union[str, List[str]]) – The region id(s) the statistics shall return

  • fields (list or fields with nested fields.) – all fields that shall be returned from the query for that region. Can either be simple fields (e.g. name) or fields with nested fields.

  • default_fields (bool) – Wether default fields shall

Raises

RuntimeError – [description]

Returns

A query object with region as start Field.

Return type

Query

results(verbose_statistics=False, verbose_enums=False, add_units=False, remove_duplicates=True)[source]
Runs the query and returns a Pandas DataFrame with the results.

It also fills the instance variable result_meta_data with meta data specific to the query instance.

Parameters
  • verbose_statistics (bool) – Toggles whether statistic column names displayed with their short description in the result data frame

  • verbose_enums (bool) – Toggles whether enum values are displayed with their short description in the result data frame

  • add_units (bool) – Adds units available in the metadata to the result dataframe. Care should be taken, because not every statistic specifies these corretly. When in doubt one should refer to the statistic description.

  • remove_duplicates (bool) – Removes duplicates from query results, i.e. if the exact same number has been reported for the same statistic, year, region etc. from the same source it gets removed. Such duplications are sometimes caused on the API side and this is convenience functionality to remove them. The removal happens before potentially joining several different statistics. Unless diagnosing the API the default (True) is generally in the users interest.

Raises

RuntimeError – If the query fails raise RuntimeError.

Returns

A DataFrame with the queried data.

Return type

DataFrame

datenguidepy.query_execution module

class datenguidepy.query_execution.ExecutionResults(query_results: List[Dict[str, Any]], meta_data: Dict[str, Union[Dict[str, str], Dict[str, Dict[Optional[str], str]]]])[source]

Bases: tuple

Results of a query with the results itself and the according meta data.

contains_undefined_region_result()[source]
property meta_data

Alias for field number 1

property query_results

Alias for field number 0

class datenguidepy.query_execution.FieldMetaDict[source]

Bases: dict

[description]

get_arguments()[source]

[summary]

Returns

[description]

Return type

Dict[str, Tuple[Optional[str], …]]

get_return_type()[source]

Returns the return type of the field of the FieldMetaDict.

Returns

The return type of the field.

Return type

str

class datenguidepy.query_execution.GraphQlSchemaMetaDataProvider(endpoint=None)[source]

Bases: object

The GraphQlSchema meta data priovider helps to obtain meta data about the structure of the Graph QL api. As such it helps to privde information as to how structurally correct queries are build. It does not directly supply information about statistics.

REQUEST_HEADER: Dict[str, str] = {'Content-Type': 'application/json'}
endpoint: str = 'https://api-next.datengui.de/graphql'
get_type_info(graph_ql_type, verbose=False)[source]

Returns a json which at top level is a dict with all the fields of the type

Parameters
  • graph_ql_type (str) – [description]

  • verbose (bool, optional) – [description], defaults to False

Returns

[description]

Return type

Optional[TypeMetaData]

class datenguidepy.query_execution.QueryExecutioner(alternative_endpoint=None, statistics_meta_data_provider=None)[source]

Bases: object

Queries the Datenguide API for data and meta data.

Parameters

alternative_endpoint (Optional[str], optional) – [description], defaults to None

Returns

[description]

Return type

None

REQUEST_HEADER: Dict[str, str] = {'Content-Type': 'application/json'}
endpoint: str = 'https://api-next.datengui.de/graphql'
get_type_info(graph_ql_type, verbose=False)[source]

Returns a json which at top level is a dict with all the fields of the type

Parameters
  • graph_ql_type (str) – [description]

  • verbose (bool, optional) – [description], defaults to False

Returns

[description]

Return type

Optional[TypeMetaData]

run_query(query)[source]

[summary]

Parameters

query ([type]) – [description]

Returns

[description]

Return type

Optional[List[ExecutionResults]]

class datenguidepy.query_execution.StatisticsGraphQlMetaDataProvider(endpoint=None)[source]

Bases: object

Statistics meta data providers help to supply informations about details pertaining to certain statistics that can be obtained via the API. This type of meta information is not API specific and can be obtained from different sources. This particular data provider uses graphql meta data information to provide results.

get_query_enum_meta(query_fields_with_types)[source]
Return type

Dict[str, Dict[Optional[str], str]]

get_query_stat_meta(query_fields_with_types)[source]
Return type

Dict[str, str]

get_query_unit_meta(query_fields_with_types)[source]
Return type

Dict[str, str]

get_stat_descriptions()[source]

[summary]

Returns

[description]

Return type

[type]

is_statistic(stat_candidate)[source]
Return type

bool

class datenguidepy.query_execution.StatisticsMetaDataProvider(*args, **kwds)[source]

Bases: Protocol

get_query_enum_meta(query_fields_with_types)[source]
Return type

Dict[str, Dict[Optional[str], str]]

get_query_stat_meta(query_fields_with_types)[source]
Return type

Dict[str, str]

get_query_unit_meta(query_fields_with_types)[source]
Return type

Dict[str, str]

get_stat_descriptions()[source]
Return type

Dict[str, Tuple[str, str]]

is_statistic(stat_candidate)[source]
Return type

bool

class datenguidepy.query_execution.StatisticsSchemaJsonMetaDataProvider[source]

Bases: object

Statistics meta data providers help to supply informations about details pertaining to certain statistics that can be obtained via the API. This type of meta information is not API specific and can be obtained from different sources. This particular data provider the hard copy of a schema file from the SOAP cubes that datenguide extracts fron GENESIS and transfers into their API.

get_enum_values()[source]
Return type

Dict[str, Dict[str, str]]

get_query_enum_meta(query_fields_with_types)[source]
Return type

Dict[str, Dict[Optional[str], str]]

get_query_stat_meta(query_fields_with_types)[source]
Return type

Dict[str, str]

get_query_unit_meta(query_fields_with_types)[source]
Return type

Dict[str, str]

get_stat_descriptions()[source]
Return type

Dict[str, Tuple[str, str]]

get_stat_units()[source]
Return type

Dict[str, str]

is_statistic(stat_candidate)[source]
Return type

bool

property stat_names
class datenguidepy.query_execution.TypeMetaData(kind: str, fields: Optional[Dict[str, Any]], enum_values: Optional[Dict[str, str]])[source]

Bases: tuple

The meta data of a field, which consist of the kind, fields and enum values.

property enum_values

Alias for field number 2

property fields

Alias for field number 1

property kind

Alias for field number 0

datenguidepy.query_execution.check_http200_body_error(body_json)[source]
Return type

None

datenguidepy.query_helper module

class datenguidepy.query_helper.ConfigMapping(mapping)[source]

Bases: object

[summary]

Parameters

mapping (Dict[str, Any]) – [description]

datenguidepy.query_helper.download_all_regions()[source]

Downloads all current regions and their hierarchy structure.

Raises
  • RuntimeError – [description]

  • RuntimeError – [description]

Returns

[description]

Return type

pd.DataFrame

datenguidepy.query_helper.get_availability_summary()[source]

Summary of available data for region/statistic combinations.

There are many regions and statistics available within the datenguide API/at the original sources. Nonetheless data is not available for all combinations of statistics and regions. Furthermore some statistics might have been discontinued after a certain point in time.

To help with the search for available statistics the function proved results from and availablility analysis for all statistics and all regions for nuts1, nuts2 and nuts3. This function returns the results of this analysis and contains for each analyzed region/statistic pair the corresponding id/code, the number of entries in the database and if applicable the first and last year when this statistic appeared.

The function does not contain an overview of the lau regions and it does not contain an overview of possible drilldowns in statstics. For instance is the statstic available for men and women individually on top of its availability for the combined population.

Return type

DataFrame

Returns

Table with available statistics.

datenguidepy.query_helper.get_regions()[source]

List of all the regions and their hierachy structure.

This function returns a DataFrame of all the regions. It contains the name of the region and the its id. The latter is required to build queries. Additionally information is provided regarding the hierachy structure by listing the parent region for each region. Furthermore the regions statistical classification (nuts/lau) is provided. To allow for more filter options.

For performance reasons this is simply read from disk. The regions are not expected to change significantly over time. Nonetheless an up to date DataFrame can be obtained with download_all_regions

Return type

DataFrame

Returns

DataFrame with all regions.

datenguidepy.query_helper.get_statistics(search=None, stat_meta_data_provider=None, target_language='de', translation_provider=None)[source]

List of all the currently available statistics.

This frunction returns a DataFrame of all available statistics. It contains the statistic code, which is required by the queries. It also contains a short and a long description of each statistic. By default it returns all available statistics, but it also has to option to provide a search keyword in advance.

The original statistic description are in Germna, but the function also allows to get a machine translated version for english of these descritpions.

Parameters
  • search (Optional[str]) – Search term used for non-case-sensitive search in the long description

  • translation_provider (Optional[TranslationProvider]) – Object used for translating the statistics. Defaults to default translation provider if None

  • target_language (str) – language to translate statistic descriptions to, Possible values are currently ‘de’, ‘en’ for the default translation provider.

  • stat_meta_data_provider – Source object used to obtain the statistic descriptions. Uses global default if missing.

Return type

DataFrame

Returns

Table with available statistics.

datenguidepy.query_helper.hirachy_down(highest_ids, lowest_level='lau', hirachy_frame=                                      name  level parent region_id                                                10                                Saarland  nuts1     DG 11                                  Berlin  nuts1     DG 12                             Brandenburg  nuts1     DG 13                  Mecklenburg-Vorpommern  nuts1     DG 14                                 Sachsen  nuts1     DG ...                                    ...    ...    ... 093            Oberpfalz, Regierungsbezirk  nuts2     09 094          Oberfranken, Regierungsbezirk  nuts2     09 095        Mittelfranken, Regierungsbezirk  nuts2     09 096         Unterfranken, Regierungsbezirk  nuts2     09 097             Schwaben, Regierungsbezirk  nuts2     09  [14085 rows x 3 columns])[source]

[summary]

Parameters
  • highest_ids (str) – [description]

  • lowest_level (str, optional) – [description], defaults to “lau”

  • hirachy_frame (pd.DataFrame, optional) – [description], defaults to ALL_REGIONS

Raises
  • RuntimeError – [description]

  • RuntimeError – [description]

Returns

[description]

Return type

pd.DataFrame

datenguidepy.query_helper.hirachy_up(lowestids, hirachy_frame=                                      name  level parent region_id                                                10                                Saarland  nuts1     DG 11                                  Berlin  nuts1     DG 12                             Brandenburg  nuts1     DG 13                  Mecklenburg-Vorpommern  nuts1     DG 14                                 Sachsen  nuts1     DG ...                                    ...    ...    ... 093            Oberpfalz, Regierungsbezirk  nuts2     09 094          Oberfranken, Regierungsbezirk  nuts2     09 095        Mittelfranken, Regierungsbezirk  nuts2     09 096         Unterfranken, Regierungsbezirk  nuts2     09 097             Schwaben, Regierungsbezirk  nuts2     09  [14085 rows x 3 columns])[source]

[summary]

Parameters
  • lowestids (str) – [description]

  • hirachy_frame (pd.DataFrame, optional) – [description], defaults to ALL_REGIONS

Raises
  • RuntimeError – [description]

  • RuntimeError – [description]

Returns

[description]

Return type

pd.DataFrame

datenguidepy.query_helper.siblings(region_id, hirachy_frame=                                      name  level parent region_id                                                10                                Saarland  nuts1     DG 11                                  Berlin  nuts1     DG 12                             Brandenburg  nuts1     DG 13                  Mecklenburg-Vorpommern  nuts1     DG 14                                 Sachsen  nuts1     DG ...                                    ...    ...    ... 093            Oberpfalz, Regierungsbezirk  nuts2     09 094          Oberfranken, Regierungsbezirk  nuts2     09 095        Mittelfranken, Regierungsbezirk  nuts2     09 096         Unterfranken, Regierungsbezirk  nuts2     09 097             Schwaben, Regierungsbezirk  nuts2     09  [14085 rows x 3 columns])[source]

[summary]

Parameters
  • region_id (pd.DataFrame) – [description]

  • hirachy_frame (pd.DataFrame, optional) – [description], defaults to ALL_REGIONS

Raises
  • RuntimeError – [description]

  • RuntimeError – [description]

Returns

[description]

Return type

pd.DataFrame

Module contents