Azure Cognitive Search supports fuzzy search, a type of query that compensates for typos and misspelled terms in the input string. It does this by scanning for terms having a similar composition. Expanding search to cover near-matches has the effect of auto-correcting a typo when the discrepancy is just a few misplaced characters.
Every document in a search result set is assigned a relevance score. The function of the relevance score is to rank higher those documents that best answer a user question as expressed by the search query. The score is computed based on statistical properties of terms that matched. At the core of the scoring formula is TF/IDF (term frequency-inverse document frequency). In queries containing rare and common terms, TF/IDF promotes results containing the rare term.
Architecturally, a search service sits in between the external data stores that contain your un-indexed data, and your client app that sends query requests to a search index and handles the response.
In this blog we will be talking about:
- Create Storage Account and upload data into Azure Blob Storage
- Create Azure Cognitive Search and Integrate with Azure Blob Storage account
- Use Postman to fuzzy search and retrieve data from Azure Search Index
Below is the step-by-step guide to achieve fuzzy search capability over your data.
Create Storage Account and upload data into Azure Blob Storage
1. Create storage account from Azure portal. In Azure marketplace, search for Storage Account and Create.
2. Choose the subscription and resource group in which storage account would be deployed. Enter a unique storage account name, choose location nearest to area of use. Choose performance as Standard and redundancy as LRS for low costs.
3. In storage account. Go to Containers. Add a Container. Enter name of container and choose public access level as Blob.
4. The container is created.
5. Click on container. Choose upload option and upload the data.json file.
The format should be in Json arrays. So that the index is able to consume data.
To refer, download provided data.json file.
We are done here creating a storage account and uploading data into the Azure blob storage. Now we will be looking into how we can access this data.
Create Azure Cognitive Search and Integrate with Azure Blob Storage account
1. Create Azure Cognitive Service from Azure portal. In Azure Marketplace search for Azure Cognitive Search and create.
2. Choose the subscription and resource group in which service would be deployed. Enter a unique service name, choose location nearest to area of use. We can choose pricing tier as Free.
3. Free version provides us with the below Usage. If we need more indexes and data storage, then we can change the pricing tier.
4. Import data in Azure Search Index. Go to Azure Search service created. Click on Import data.
5. Connect to your data. Fill in the required fields.
For connection string: choose an existing connection. Select the storage account and container that contains data.json that we uploaded in the above steps.
6. Move to Customize target index section. The data is read into index and columns are extracted from the provided data.
7. Now, select the columns we want to retrieve, filter, sort, face, or search data. Choose the analyzer. By default – Standard Lucene.
8. Move to next step. Create an indexer and click create.
Below are the Usage details after importing data into index:
We are done here creating an index and connecting it with the data source. Now we will be looking into how we can use Azure Cognitive Search in other applications.
Use Postman to fuzzy search and retrieve data from Azure Search Index
1. Go to indexes, click on the created index.
2. Copy the search index Request URL:
3. We need api-key to be provided in header for authentication. Go to Keys. Copy the Primary admin key.
4. Open Postman application. To query data from Postman. Choose HTTP GET method and paste the Request Url and api-key.
5. Now, we can search for data. In Query parameters we can provide search text. Let’s search Beth in data. Unfortunately, we didn’t find any data. This is because we did not have this search term in our data. This is to show, how fuzzy search will be helpful to search data for misspelled words.
6. For fuzzy search
- Set the full Lucene parser on the query (queryType=full).
- Use the tilde (~) parameter. Append the tilde (~) operator at the end of the whole term (search=<string>~). For example, Beth~1. The fuzzy search will work on search term – Beth with an edit distance of 1. The data will be searched on words such as Both, Bath, Beth, etc.
7. Now we can retrieve data as shown below. The default distance of an edit is 2. A value of ~0 signifies no expansion (only the exact term is considered a match), but you could specify ~1 for one degree of difference, or one edit. The edit distance can be increased or decreased as per use case.
We have seen how we can fuzzy search data leveraging free Azure Cognitive Search service. I hope this blog was helpful.
- Implementing Fuzzy Search Using Azure Cognitive Search: A Step by Step Guide - November 5, 2021