datenguidepy package¶
Submodules¶
datenguidepy.output_transformer module¶
- class datenguidepy.output_transformer.QueryOutputTransformer(query_response)[source]¶
Bases:
object
Transforms the query results into a DataFrame.
- Parameters
query_response (List[ExecutionResults]) – Accepts the return type of the query executioner in case a non None value was return. This is a list of ExecutionResults as some python querys may internally be converted into several GraphQL queries to be executed, returnning one result each.
- transform(verbose_statistic_names=False, verbose_enum_values=False, add_units=False, remove_duplicates=False)[source]¶
Transform the queries results into a Pandas DataFrame.
This function allows for different flags that make the results more readable by using meta information about the query. By default the dataframe is not enrichted by meta information assuming an experienced user familiar with a particular statistic. For data exploration it is recommended to turn on one or more flags.
- Parameters
verbose_statistic_names (
bool
) – Toggles statistic codes to short descriptions.verbose_enum_values (
bool
) – Toggles enum codes to descriptions if enum columns are present.add_units (
bool
) – Toggles the addition of a unit column for each statistic to make it easier to interpret the numbers.remove_duplicates (
bool
) – Removes duplicates from query results, i.e. if the exact same number has been reported for the same statistic, year, region etc. from the same source it gets removed. Such duplications are sometimes caused on the API side and this is convenience functionality to remove them. The removal happens before potentially joining several different statistics.
- Return type
DataFrame
- Returns
Returns a pandas DataFrame of the queries results.
datenguidepy.query_builder module¶
- class datenguidepy.query_builder.Field(name, fields=[], args={}, parent_field=None, default_fields=True, return_type=None, stat_meta_data_provider=None)[source]¶
Bases:
object
A field of a query that specifies a statistic (or another information, e.g. source) to query. The name of the field (mostly statistic), the filters (specified with args) and the desired output information (fields) are specified.
- Parameters
name (str) – Name of Field (statistic)
fields (list, optional) – desired output fields (e.g. year or NAT), defaults to []
args (
Dict
[str
,Any
]) – Filters for the desired field (e.g. {‘year’: 2017}).
If “ALL” is passed as a value, then results are returned for all possible subgroups. (e.g. for gender ‘GES’: ‘ALL’ three data entries are returned - for male, female and summed for both. if the filter is not set, then only the summed result is returned. Except for year: this is by default returned for each year), defaults to {} :type args: dict, optional :type parent_field:
Optional
[Field
] :param parent_field: The field this field is attached to, defaults to None :type parent_field: Field, optional :type default_fields:bool
:param default_fields: Wether default fields should be attached or not, defaults to True :type default_fields: bool, optional :type return_type:Optional
[str
] :param return_type: The graphQL return type of this field, defaults to None :type return_type: str, optional- add_args(args)[source]¶
Add arguments to the field. :type args:
dict
:param args: Arguments to be added. :type args: dict
- add_field(field, default_fields=None)[source]¶
Add a subfield to the field.
- Raises
TypeError – If the added field is neither of type String nor Field.
- Returns
the added field
- Return type
- arguments_info()[source]¶
Get information on possible arguments for field. The name of the argument is followed by the kind and name of the input type for the argument in brackets. If the argument is a list, the kind and name of the list elements are included in the brackets as well.
- Returns
Possible arguments for the field as string and their input types.
- Return type
Optional[str]
- description()[source]¶
Get description of field.
- Returns
Description of the field as string.
- Return type
Optional[str]
- drop_field(field)[source]¶
Drop an attached subfield of the field.
- Parameters
field (str) – The name of the field to be droped.
- Returns
The field without the subfield.
- Return type
- enum_info()[source]¶
Get information on possible enum vaules for field.
- Returns
Possible enum values for the field as string.
- Return type
Optional[str]
- fields_info()[source]¶
Get information on possible fields for field.
- Returns
Possible fields for the field as string
- Return type
Optional[str]
- class datenguidepy.query_builder.Query(start_field, region_field=None, default_fields=True, stat_meta_data_provider=None)[source]¶
Bases:
object
A query to get information via the datenguide API for regionalstatistik. The query contains all fields and arguments.
- Parameters
if start_field is allRegions, defaults to None :type region_field: Field, optional :type default_fields:
bool
:param default_fields: Wether default fields shallbe attached to the fields., defaults to True
- Raises
RuntimeError – [description]
- add_field(field, default_fields=None)[source]¶
Add a field to the query.
- Parameters
field (Union[str, Field]) – Field to be added
default_fields (
Optional
[bool
]) – Wether default fields
should be attached or not, defaults to None :type default_fields: bool, optional :raises RuntimeError: If the allRegions Query has
no regions field a subfield can be attached to.
- Returns
The added field.
- Return type
- classmethod all_regions(fields=[], parent=None, nuts=None, lau=None, default_fields=True, stat_meta_data_provider=None)[source]¶
Factory method to instantiate a Query with allRegions start field. A parent id, nuts or lau can be further specified for the query.
- Parameters
fields (list) – all fields that shall be returned for that region. Can either be simple fields (e.g. name) or fields with nested fields.
parent (
Optional
[str
]) – The region id of the parent region the statistics shall be queried for. (E.g. the id for a state where all sub regions within the state shall be queried for.)nuts (int, optional) – The administration level: 1 – Bundesländer 2 – Regierungsbezirke / statistische Regionen 3 – Kreise / kreisfreie Städte. Default None returns results for all levels.
lau (int, optional) – The administration level: 1 - Verwaltungsgemeinschaften 2 - Gemeinden. Default returns results for all levels.
default_fields (bool) – Wether default fields shall be attached to the fields.
- Returns
A query object with allRegions as start Field.
- Return type
- drop_field(field)[source]¶
Drop an attached field of the query.
- Parameters
field (str) – The name of the field to be droped
- Raises
RuntimeError – Raises Error if Query is
initialized without regions field. :return: the query without the dropped field :rtype: Query
- get_graphql_query()[source]¶
Formats the Query into a String that can be queried from the Datenguide API.
- Returns
the Query formatted for the GraphQL API as a List of query strings
- Return type
List[str]
- get_info(field=None)[source]¶
Get information on a specific field. If field is not specified return meta data for all statistics that can be queried.
- Parameters
field (str, optional) – the field to get information on. If None, then information on all possible fields of a query are returned, defaults to None
- Returns
Response from QueryExecutioner on meta data info
- Return type
Optional[TypeMetaData]
- meta_data()[source]¶
Runs the query and returns a Dict with the meta data of the queries results.
- Raises
RuntimeError – If the Query did not return any results.
E.g. if the Query was ill-formed. :return: A Dict with the queried meta data.
If the query fails raise RuntimeError.
- Return type
Union[Dict[str, Any], List[Dict[str, Any]]]
- classmethod region(region, fields=[], default_fields=True, stat_meta_data_provider=None)[source]¶
- Factory method to instantiate a Query with a single region through
its region id.
- Parameters
region (Union[str, List[str]]) – The region id(s) the statistics shall return
fields (list or fields with nested fields.) – all fields that shall be returned from the query for that region. Can either be simple fields (e.g. name) or fields with nested fields.
default_fields (bool) – Wether default fields shall
- Raises
RuntimeError – [description]
- Returns
A query object with region as start Field.
- Return type
- results(verbose_statistics=False, verbose_enums=False, add_units=False, remove_duplicates=True)[source]¶
- Runs the query and returns a Pandas DataFrame with the results.
It also fills the instance variable result_meta_data with meta data specific to the query instance.
- Parameters
verbose_statistics (
bool
) – Toggles whether statistic column names displayed with their short description in the result data frameverbose_enums (
bool
) – Toggles whether enum values are displayed with their short description in the result data frameadd_units (
bool
) – Adds units available in the metadata to the result dataframe. Care should be taken, because not every statistic specifies these corretly. When in doubt one should refer to the statistic description.remove_duplicates (
bool
) – Removes duplicates from query results, i.e. if the exact same number has been reported for the same statistic, year, region etc. from the same source it gets removed. Such duplications are sometimes caused on the API side and this is convenience functionality to remove them. The removal happens before potentially joining several different statistics. Unless diagnosing the API the default (True) is generally in the users interest.
- Raises
RuntimeError – If the query fails raise RuntimeError.
- Returns
A DataFrame with the queried data.
- Return type
DataFrame
datenguidepy.query_execution module¶
- class datenguidepy.query_execution.ExecutionResults(query_results: List[Dict[str, Any]], meta_data: Dict[str, Union[Dict[str, str], Dict[str, Dict[Optional[str], str]]]])[source]¶
Bases:
tuple
Results of a query with the results itself and the according meta data.
- property meta_data¶
Alias for field number 1
- property query_results¶
Alias for field number 0
- class datenguidepy.query_execution.FieldMetaDict[source]¶
Bases:
dict
[description]
- class datenguidepy.query_execution.GraphQlSchemaMetaDataProvider(endpoint=None)[source]¶
Bases:
object
The GraphQlSchema meta data priovider helps to obtain meta data about the structure of the Graph QL api. As such it helps to privde information as to how structurally correct queries are build. It does not directly supply information about statistics.
-
REQUEST_HEADER:
Dict
[str
,str
] = {'Content-Type': 'application/json'}¶
-
endpoint:
str
= 'https://api-next.datengui.de/graphql'¶
- get_type_info(graph_ql_type, verbose=False)[source]¶
Returns a json which at top level is a dict with all the fields of the type
- Parameters
graph_ql_type (str) – [description]
verbose (bool, optional) – [description], defaults to False
- Returns
[description]
- Return type
Optional[TypeMetaData]
-
REQUEST_HEADER:
- class datenguidepy.query_execution.QueryExecutioner(alternative_endpoint=None, statistics_meta_data_provider=None)[source]¶
Bases:
object
Queries the Datenguide API for data and meta data.
- Parameters
alternative_endpoint (Optional[str], optional) – [description], defaults to None
- Returns
[description]
- Return type
None
-
REQUEST_HEADER:
Dict
[str
,str
] = {'Content-Type': 'application/json'}¶
-
endpoint:
str
= 'https://api-next.datengui.de/graphql'¶
- get_type_info(graph_ql_type, verbose=False)[source]¶
Returns a json which at top level is a dict with all the fields of the type
- Parameters
graph_ql_type (str) – [description]
verbose (bool, optional) – [description], defaults to False
- Returns
[description]
- Return type
Optional[TypeMetaData]
- run_query(query)[source]¶
[summary]
- Parameters
query ([type]) – [description]
- Returns
[description]
- Return type
Optional[List[ExecutionResults]]
- class datenguidepy.query_execution.StatisticsGraphQlMetaDataProvider(endpoint=None)[source]¶
Bases:
object
Statistics meta data providers help to supply informations about details pertaining to certain statistics that can be obtained via the API. This type of meta information is not API specific and can be obtained from different sources. This particular data provider uses graphql meta data information to provide results.
- class datenguidepy.query_execution.StatisticsMetaDataProvider(*args, **kwds)[source]¶
Bases:
Protocol
- class datenguidepy.query_execution.StatisticsSchemaJsonMetaDataProvider[source]¶
Bases:
object
Statistics meta data providers help to supply informations about details pertaining to certain statistics that can be obtained via the API. This type of meta information is not API specific and can be obtained from different sources. This particular data provider the hard copy of a schema file from the SOAP cubes that datenguide extracts fron GENESIS and transfers into their API.
- get_query_enum_meta(query_fields_with_types)[source]¶
- Return type
Dict
[str
,Dict
[Optional
[str
],str
]]
- property stat_names¶
- class datenguidepy.query_execution.TypeMetaData(kind: str, fields: Optional[Dict[str, Any]], enum_values: Optional[Dict[str, str]])[source]¶
Bases:
tuple
The meta data of a field, which consist of the kind, fields and enum values.
- property enum_values¶
Alias for field number 2
- property fields¶
Alias for field number 1
- property kind¶
Alias for field number 0
datenguidepy.query_helper module¶
- class datenguidepy.query_helper.ConfigMapping(mapping)[source]¶
Bases:
object
[summary]
- Parameters
mapping (Dict[str, Any]) – [description]
- datenguidepy.query_helper.download_all_regions()[source]¶
Downloads all current regions and their hierarchy structure.
- Raises
RuntimeError – [description]
RuntimeError – [description]
- Returns
[description]
- Return type
pd.DataFrame
- datenguidepy.query_helper.get_availability_summary()[source]¶
Summary of available data for region/statistic combinations.
There are many regions and statistics available within the datenguide API/at the original sources. Nonetheless data is not available for all combinations of statistics and regions. Furthermore some statistics might have been discontinued after a certain point in time.
To help with the search for available statistics the function proved results from and availablility analysis for all statistics and all regions for nuts1, nuts2 and nuts3. This function returns the results of this analysis and contains for each analyzed region/statistic pair the corresponding id/code, the number of entries in the database and if applicable the first and last year when this statistic appeared.
The function does not contain an overview of the lau regions and it does not contain an overview of possible drilldowns in statstics. For instance is the statstic available for men and women individually on top of its availability for the combined population.
- Return type
DataFrame
- Returns
Table with available statistics.
- datenguidepy.query_helper.get_regions()[source]¶
List of all the regions and their hierachy structure.
This function returns a DataFrame of all the regions. It contains the name of the region and the its id. The latter is required to build queries. Additionally information is provided regarding the hierachy structure by listing the parent region for each region. Furthermore the regions statistical classification (nuts/lau) is provided. To allow for more filter options.
For performance reasons this is simply read from disk. The regions are not expected to change significantly over time. Nonetheless an up to date DataFrame can be obtained with download_all_regions
- Return type
DataFrame
- Returns
DataFrame with all regions.
- datenguidepy.query_helper.get_statistics(search=None, stat_meta_data_provider=None, target_language='de', translation_provider=None)[source]¶
List of all the currently available statistics.
This frunction returns a DataFrame of all available statistics. It contains the statistic code, which is required by the queries. It also contains a short and a long description of each statistic. By default it returns all available statistics, but it also has to option to provide a search keyword in advance.
The original statistic description are in Germna, but the function also allows to get a machine translated version for english of these descritpions.
- Parameters
search (
Optional
[str
]) – Search term used for non-case-sensitive search in the long descriptiontranslation_provider (
Optional
[TranslationProvider
]) – Object used for translating the statistics. Defaults to default translation provider if Nonetarget_language (
str
) – language to translate statistic descriptions to, Possible values are currently ‘de’, ‘en’ for the default translation provider.stat_meta_data_provider – Source object used to obtain the statistic descriptions. Uses global default if missing.
- Return type
DataFrame
- Returns
Table with available statistics.
- datenguidepy.query_helper.hirachy_down(highest_ids, lowest_level='lau', hirachy_frame= name level parent region_id 10 Saarland nuts1 DG 11 Berlin nuts1 DG 12 Brandenburg nuts1 DG 13 Mecklenburg-Vorpommern nuts1 DG 14 Sachsen nuts1 DG ... ... ... ... 093 Oberpfalz, Regierungsbezirk nuts2 09 094 Oberfranken, Regierungsbezirk nuts2 09 095 Mittelfranken, Regierungsbezirk nuts2 09 096 Unterfranken, Regierungsbezirk nuts2 09 097 Schwaben, Regierungsbezirk nuts2 09 [14085 rows x 3 columns])[source]¶
[summary]
- Parameters
highest_ids (str) – [description]
lowest_level (str, optional) – [description], defaults to “lau”
hirachy_frame (pd.DataFrame, optional) – [description], defaults to ALL_REGIONS
- Raises
RuntimeError – [description]
RuntimeError – [description]
- Returns
[description]
- Return type
pd.DataFrame
- datenguidepy.query_helper.hirachy_up(lowestids, hirachy_frame= name level parent region_id 10 Saarland nuts1 DG 11 Berlin nuts1 DG 12 Brandenburg nuts1 DG 13 Mecklenburg-Vorpommern nuts1 DG 14 Sachsen nuts1 DG ... ... ... ... 093 Oberpfalz, Regierungsbezirk nuts2 09 094 Oberfranken, Regierungsbezirk nuts2 09 095 Mittelfranken, Regierungsbezirk nuts2 09 096 Unterfranken, Regierungsbezirk nuts2 09 097 Schwaben, Regierungsbezirk nuts2 09 [14085 rows x 3 columns])[source]¶
[summary]
- Parameters
lowestids (str) – [description]
hirachy_frame (pd.DataFrame, optional) – [description], defaults to ALL_REGIONS
- Raises
RuntimeError – [description]
RuntimeError – [description]
- Returns
[description]
- Return type
pd.DataFrame
- datenguidepy.query_helper.siblings(region_id, hirachy_frame= name level parent region_id 10 Saarland nuts1 DG 11 Berlin nuts1 DG 12 Brandenburg nuts1 DG 13 Mecklenburg-Vorpommern nuts1 DG 14 Sachsen nuts1 DG ... ... ... ... 093 Oberpfalz, Regierungsbezirk nuts2 09 094 Oberfranken, Regierungsbezirk nuts2 09 095 Mittelfranken, Regierungsbezirk nuts2 09 096 Unterfranken, Regierungsbezirk nuts2 09 097 Schwaben, Regierungsbezirk nuts2 09 [14085 rows x 3 columns])[source]¶
[summary]
- Parameters
region_id (pd.DataFrame) – [description]
hirachy_frame (pd.DataFrame, optional) – [description], defaults to ALL_REGIONS
- Raises
RuntimeError – [description]
RuntimeError – [description]
- Returns
[description]
- Return type
pd.DataFrame