Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): dataset fields statistics #1360

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
title: DatasetFieldStatistics
type: object
properties:
min:
type: number
description: 'Minimum value of the field. For numbers, this is calculated directly. For strings, this is the length of the shortest string. For arrays, this is the length of the shortest array. For objects, this is the number of keys in the smallest object.'
nullable: true
max:
type: number
description: 'Maximum value of the field. For numbers, this is calculated directly. For strings, this is the length of the longest string. For arrays, this is the length of the longest array. For objects, this is the number of keys in the largest object.'
nullable: true
nullCount:
type: number
description: 'How many items in the dataset have a null value for this field.'
nullable: true
emptyCount:
type: number
description: 'How many items in the dataset are `undefined`, meaning that for example empty string is not considered empty.'
nullable: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
title: GetDatasetStatisticsResponse
required:
- data
type: object
properties:
data:
type: object
properties:
fieldStatistics:
type: object
additionalProperties:
$ref: ./DatasetFieldStatistics.yaml
description: 'When you configure the dataset [fields schema](https://docs.apify.com/platform/actors/development/actor-definition/dataset-schema/validation), we measure the statistics such as `min`, `max`, `nullCount` and `emptyCount` for each field.
This property provides statistics for each field from dataset fields schema.
<br/></br>See dataset field statistics [documentation](https://docs.apify.com/platform/actors/development/actor-definition/dataset-schema/validation#dataset-field-statistics) for more information.'
4 changes: 4 additions & 0 deletions apify-api/openapi/components/tags.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -758,6 +758,10 @@
x-legacy-doc-urls:
- '#/reference/datasets/item-collection'
x-trait: 'true'
- name: Datasets/Statistics
x-displayName: Statistics
x-parent-tag-name: Datasets
x-trait: 'true'
- name: Request queues
x-displayName: Request queues
x-legacy-doc-urls:
Expand Down
1 change: 1 addition & 0 deletions apify-api/openapi/components/x-tag-groups.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@
- Datasets/Dataset collection
- Datasets/Dataset
- Datasets/Item collection
- Datasets/Statistics
- name: Request queues
tags:
- Request queues
Expand Down
2 changes: 2 additions & 0 deletions apify-api/openapi/openapi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -566,6 +566,8 @@ paths:
$ref: 'paths/datasets/datasets@{datasetId}.yaml'
'/v2/datasets/{datasetId}/items':
$ref: 'paths/datasets/datasets@{datasetId}@items.yaml'
'/v2/datasets/{datasetId}/statistics':
$ref: 'paths/datasets/datasets@{datasetId}@statistics.yaml'
/v2/request-queues:
$ref: paths/request-queues/request-queues.yaml
'/v2/request-queues/{queueId}':
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
get:
tags:
- Datasets/Statistics
summary: Get dataset statistics
description: |
Returns statistics for given dataset.
Currently provides only [field statistics](https://docs.apify.com/platform/actors/development/actor-definition/dataset-schema/validation#dataset-field-statistics).

operationId: dataset_statistics_get
parameters:
- name: datasetId
in: path
description: Dataset ID or `username~dataset-name`.
required: true
style: simple
schema:
type: string
example: WkzbQMuFYuamGv3YF
- name: token
in: query
description: |
API authentication token. It is required only when using the `username~dataset-name` format for `datasetId`.
style: form
explode: true
schema:
type: string
example: soSkq9ekdmfOslopH
responses:
'200':
description: ''
content:
application/json:
schema:
$ref: "../../components/schemas/datasets/GetDatasetStatisticsResponse.yaml"
example:
data:
fieldStatistics:
name:
nullCount: 122
price:
min: 59
max: 89
# TODO: add clients methods
# x-js-parent: DatasetClient
# x-js-name: statistics
# x-js-doc-url: https://docs.apify.com/api/client/js/reference/class/DatasetClient#statistics
# x-py-parent: DatasetClientAsync
# x-py-name: statistics
# x-py-doc-url: https://docs.apify.com/api/client/python/reference/class/DatasetClientAsync#statistics
2 changes: 1 addition & 1 deletion package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading