Purpose of the article: This blog aims to help readers understand both Dynamic and Static Indexing in Elasticsearch. The aim is to comprehensively explain these two key concepts, covering their definitions, use cases, and the method of implementation in the engine.
Intended Audience: Python and Elasticsearch.
Tools and Technology: Python and Elasticsearch
Keywords: Python and Elasticsearch API, Content Management System
Introduction:
Elasticsearch is a popular open-source search and analytics engine known for its exceptional capabilities in handling vast amounts of data and enabling real-time search. It is a powerful tool for handling large volumes of data. Its dynamic indexing feature is particularly valuable for adapting to evolving data structures. This blog will delve into the world of Dynamic and Static Indexing in Elasticsearch. We will explore how dynamic indexing allows you to construct agile and adaptable real-time search engines using Python and Elasticsearch.
Understanding Static Indexing
When you are working with Elasticsearch, establishing a structured index mapping is crucial to ensure that data is stored and retrieved accurately. Let us take a closer look at a sample mapping.
mapping = {
"mappings": {
"properties": {
"id": {"type": "keyword"},
"employeeIdentifier": {"type": "keyword"},
"employeeNumber": {"type": "keyword"},
"fromDate": {"type": "date"},
"toDate": {"type": "date"},
"requestedOn": {"type": "date"},
"note": {"type": "text"},
"status": {"type": "integer"},
"selection": {
"type": "nested",
"properties": {
"leaveTypeIdentifier": {"type": "keyword"},
"leaveTypeName": {"type": "keyword"},
"count": {"type": "float"},
"duration": {"type": "float"}
} },
"sessionType": {"type": "keyword"}
}
} }
This mapping outlines the structure of documents to be stored in the index:
- ID, Employee Identifier, and Employee Number serve as unique identifiers.
- Datewise, with provision for capturing time.
- Note accommodating longer textual data
- Status representing specific conditions with integer values
- Nested field designed to handle leave-related data
- Session Type defining the type of session, capturing session-related details.
This mapping defines the expected structure of documents stored in the index. Each field type specification ensures that the data in the index is handled accurately, making it suitable for efficient search and analysis.
Understanding Dynamic Indexing:
Dynamic indexing in Elasticsearch offers a flexible way to map fields in documents based on patterns. This method utilizes dynamic templates. For instance, consider these templates:
mapping = {
"mappings": {
"dynamic_templates": [
{
"strings_as_keywords": {
"path_match": "customFields. *",
"mapping": {
"type": "keyword"
}
}
},
{
"nested_objects": {
"path_match": "*.id",
"mapping": {
"type": "keyword"
}
}
}
]
}
}
- Strings as keywords: Under this template, fields residing within the “Custom Fields” path are mapped with a “keyword” data type. This template guarantees that fields under this path are treated as precise values regardless of their specific names. This is particularly useful for fields serving as identifiers or tags.
- Nested objects: Fields concluding with “.id” within document names are mapped as “keyword” types. This is a valuable strategy for treating fields such as “example.id” as exact values, often utilized for identifiers.
Dynamic templates simplify field mapping, ensuring that specific field patterns are treated consistently without manual type specification. This is particularly advantageous when dealing with diverse and evolving data.
Python Usage in Elasticsearch:
Elasticsearch, a powerful search and analytics engine, is enhanced by Python, a widely used language, for streamlined data interaction. The official Elasticsearch Python library facilitates connecting to clusters, CRUD operations, and complex queries. Python tools, such as Elasticsearch, provide a Python-centric approach, improving data manipulation and analysis before indexing. Python scripts become invaluable for data pre-processing when integrated with Elasticsearch, making data-related tasks more efficient.
Use Case for Dynamic Indexing in Elasticsearch:
Dynamic indexing in Elasticsearch is particularly useful when dealing with dynamic and evolving data structures. Let us consider a real-world use case to illustrate its value:
Dynamic indexing in an e-commerce platform facilitates seamless adaptation to frequently appearing new products with evolving attributes over time.
- Product Attribute: New attributes like color, size, and material can be added or modified without manual index mapping changes. Dynamic templates automatically identify and index these attributes.
- Product Categories: As the platform grows, introducing new product categories with unique properties becomes seamless. Dynamic indexing allows indexing without the need to reconfigure the index structure each time a new category is added.
- Pricing and Availability: Frequent changes in prices and availability are common. Dynamic indexing enables updates to these fields without redefining the mapping, ensuring the real-time search engine remains agile and adaptable to changing product data.
Dynamic indexing streamlines the handling of evolving data, eliminating the need for frequent mapping adjustments. Crucial for businesses dealing with regular change of data, it reduces downtime and maintenance efforts, ensuring an efficient and advanced search engine.
Conclusion
Mastering Dynamic and Static Indexing in Elasticsearch is vital for effective data management and retrieval. Dynamic Indexing streamlines data management, crucial for businesses with changing data, reducing downtime and maintenance efforts for an efficient and advanced search engine.
References
elasticsearch-py.readthedocs.io
https://en.wikipedia.org/wiki/Elasticsearch
Author Bio:
Pavani ATLA
Associate Software Engineer, Data Science-Analytics
With 2.5 years of technical experience in Python, SQL, and AWS, I specialize in preprocessing and transforming both structured and unstructured data. I am enthusiastic about discovering and learning new technologies, with the aim of enhancing business performance through data science.