DataHubGc
CLI based Ingestion
Install the Plugin
The datahub-gc source works out of the box with acryl-datahub.
Config Details
- Options
- Schema
Note that a . is used to denote nested fields in the YAML recipe.
| Field | Description | 
|---|---|
| cleanup_expired_tokens boolean | Whether to clean up expired tokens or not Default: True | 
| truncate_index_older_than_days integer | Indices older than this number of days will be truncated Default: 30 | 
| truncate_indices boolean | Whether to truncate elasticsearch indices or not which can be safely truncated Default: True | 
| truncation_sleep_between_seconds integer | Sleep between truncation monitoring. Default: 30 | 
| truncation_watch_until integer | Wait for truncation of indices until this number of documents are left Default: 10000 | 
The JSONSchema for this configuration is inlined below.
{
  "title": "DataHubGcSourceConfig",
  "type": "object",
  "properties": {
    "cleanup_expired_tokens": {
      "title": "Cleanup Expired Tokens",
      "description": "Whether to clean up expired tokens or not",
      "default": true,
      "type": "boolean"
    },
    "truncate_indices": {
      "title": "Truncate Indices",
      "description": "Whether to truncate elasticsearch indices or not which can be safely truncated",
      "default": true,
      "type": "boolean"
    },
    "truncate_index_older_than_days": {
      "title": "Truncate Index Older Than Days",
      "description": "Indices older than this number of days will be truncated",
      "default": 30,
      "type": "integer"
    },
    "truncation_watch_until": {
      "title": "Truncation Watch Until",
      "description": "Wait for truncation of indices until this number of documents are left",
      "default": 10000,
      "type": "integer"
    },
    "truncation_sleep_between_seconds": {
      "title": "Truncation Sleep Between Seconds",
      "description": "Sleep between truncation monitoring.",
      "default": 30,
      "type": "integer"
    }
  },
  "additionalProperties": false
}
Code Coordinates
- Class Name: datahub.ingestion.source.gc.datahub_gc.DataHubGcSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for DataHubGc, feel free to ping us on our Slack.
Is this page helpful?