Building My Own DSA Question

Building My Own DSA Question

Topic: Bloom filters and count-min sketch.

One company TechCurators has been camed to my college and they gave me this assignment to make the dsa question, so basically company what does is this only they take the assesments of the candidates for the Jobs.
So yeah, In this blog I will give you an overview of my assignment.

We have to frame the question like leetcode questions only so below is the question and also answer for the same, you can go through the assignment.

Que Topic: Bloom Filters and Count-Min Sketch

E-commerce Search Optimization.

Imagine you’re employed at an retail giant, like Amazon aiming to enhance its search feature.With an array of products, in its inventory the goal is to introduce a system that not quickens product searches but also sheds light on the trending search terms.

Problem:
Your task is to use two data models: Bloom filter and minimum count stubs to refine the search and provide frequency estimates for search items.

Requirements:

Bloom Filter:
The Bloom Filter will be used to quickly check whether a search term is likely to be in the product catalog.

• The company wants to reduce the number of database queries for products that don’t exist.
• It should have a bit array size of 1000 and use 3 hash functions.
• Implement the following functions:
-void insert(string keyword): Insert a search term into the Bloom
Filter.
-bool contains(string keyword): Check if a search term is likely in
the product catalog.

Count-Min Sketch:
Count-Min Sketch is used to estimate the popularity of search terms.
• The company wants to know which search terms are the most popular among users.
• It should have 4 rows and 10 columns.
• Implement the following functions:
- void update(string keyword): Update the Count-Min Sketch with a
search term.
- int query(string keyword): Estimate the popularity (frequency) of a
search term.

Specifications:

• Implement both data structures in C++, Java, Python, and C.
• Use the following hash functions for both structures:
-Simple Hash Function: Sum of ASCII values of characters in the keyword
modulo array size.
-Double Hash Function: (hash1(keyword) + i *
hash2(keyword)) % array size, where i is the index of the hash
function.

Input Format:

The input consists of the following:
- The first line indicating the number of test cases, T.
- For each test case:
- The first line containing an integer N, denoting the number of operations.
- Each of the following N lines containing an operation in the format:
- insert <keyword>: Insert a search term into the Bloom Filter.
- contains <keyword>: Check if a search term is likely present in the product catalog.
- update <keyword>: Update the Count-Min Sketch with a search term.
- query <keyword>: Estimate the popularity (frequency) of a search term.”

Output Format:

For each test case, output the following:
For Bloom Filter Operations:
- For each ‘contains’ operation, output 1 if the search term is likely present in the product
catalog; otherwise, output 0.
For Count-Min Sketch Operations:
- For each ‘query’ operation, output the estimated frequency of the search term.”

Test case 1:

Input:
1
7
insert apple
insert banana
query apple
contains orange
update apple
query apple
contains apple

Test case 2:

Input:
2
4
insert apple
insert banana
query apple
contains orange
3
contains apple
contains orange
query banana

Test case 3:

Input:
1
6
insert apple
insert banana
query apple
contains orange
update apple
query apple

Output

Test case 1:
1
0
2
1

Test case 2:
1
0
0
0
0

Test case 3:
1
0
2

Constraints:

• 1 ≤ T ≤ 10: The number of test cases is between 1 and 10.
• 1 ≤ N ≤ 100: The number of search terms in each test case is between 1 and 100.
• Search terms are alphanumeric strings of length up to 100 characters.
• Input is case-insensitive: All input search terms will be in lowercase.
• Bloom Filter should have no false negatives: This means that if the Bloom Filter says a term is not present, it is guaranteed not to be present.
• Count-Min Sketch should provide estimates within 10% of the actual frequency: The estimated frequency provided by the Count-Min Sketch should be within 10% of the actual frequency.

Note:

This problem involves the use of two data models: bloom filters and minimum count plots.
Bloom filters are used for rapid membership testing, while minimum count stubs are used for frequency estimation. The hash functions specified should be used for both structures. The output for each test case should match the provided examples.

Solution:©

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include <limits.h>

#define SIZE 1000

#define ROWS 4

#define COLUMNS 10
struct BloomFilter {
int bitArray[SIZE];
int hashSeeds[3];
};
struct CountMinSketch {
int sketch[ROWS][COLUMNS];
int hashSeeds[ROWS];
};
int simpleHash(const char keyword, int size) {
int hashValue = 0;
for (int i = 0; keyword[i] != ‘\0’; i++) {
hashValue = (hashValue + keyword[i]) % size;
}
return hashValue;
}
int doubleHash(const char
keyword, int size, int index) {
int hash1 = simpleHash(keyword, size);
int hash2 = index (1 + simpleHash(keyword, size));
return (hash1 + hash2) % size;
}
void insert(struct BloomFilter
bloomFilter, const char keyword) {
for (int i = 0; i < 3; i++) {
int hashValue = doubleHash(keyword, SIZE, i);
bloomFilter->bitArray[hashValue] = 1;
}
}
int contains(struct BloomFilter
bloomFilter, const char keyword) {
for (int i = 0; i < 3; i++) {
int hashValue = doubleHash(keyword, SIZE, i);
if (bloomFilter->bitArray[hashValue] == 0) {
return 0;
}
}
return 1;
}
void update(struct CountMinSketch
countMinSketch, const char keyword) {
for (int i = 0; i < ROWS; i++) {
int hashValue = doubleHash(keyword, COLUMNS, i);
countMinSketch->sketch[i][hashValue] += 1;
}
}
int query(struct CountMinSketch
countMinSketch, const char keyword) {
int minFrequency = INT_MAX;
for (int i = 0; i < ROWS; i++) {
int hashValue = doubleHash(keyword, COLUMNS, i);
minFrequency = (countMinSketch->sketch[i][hashValue] < minFrequency) ?
countMinSketch->sketch[i][hashValue] : minFrequency;
}
return minFrequency;
}
int main() {
int T;
scanf(“%d”, &T);
for (int t = 0; t < T; t++) {
int N;
scanf(“%d”, &N);
struct BloomFilter bloomFilter;
struct CountMinSketch countMinSketch;
// Initialize BloomFilter and CountMinSketch
for (int i = 0; i < 3; i++) {
bloomFilter.hashSeeds[i] = i + 1;
}
for (int i = 0; i < ROWS; i++) {
countMinSketch.hashSeeds[i] = i + 1;
}
memset(bloomFilter.bitArray, 0, SIZE
sizeof(int));
memset(countMinSketch.sketch, 0, ROWS COLUMNS sizeof(int));
for (int i = 0; i < N; i++) {
char operation[10], keyword[100]; // Adjusted keyword size to 100
scanf(“%s %s”, operation, keyword);
if (strcmp(operation, “insert”) == 0) {
insert(&bloomFilter, keyword);
update(&countMinSketch, keyword);
} else if (strcmp(operation, “contains”) == 0) {
printf(“%d\n”, contains(&bloomFilter, keyword));
} else if (strcmp(operation, “update”) == 0) {
update(&countMinSketch, keyword);
} else if (strcmp(operation, “query”) == 0) {
printf(“%d\n”, query(&countMinSketch, keyword));
}
}
printf(“\n”); // Add a newline after each test case
}
return 0;
}

Thanks for reading, (Hope you enjoyed it)….