and I will split string Q into K plus 1,

in this case four sub-segments.

So I will basically say okay let me be able split this into four equal sized segments.

Great.

What I will then do is, I will say,

well instead of searching this entire sequence Q in P,

I will search its sub-segments in P. How will this help?

Well, the idea is very simple, right.

So I know that there are three errors,

up to three errors but I don't know where those three errors are.

So all the three errors,

there is a chance that they might basically focus all of

them let's assume in the first of the segments,

or in the worst case,

the errors may basically be distributed,

maybe far from each other or maybe it could cover multiple segments.

But note that if I have at most three errors,

and if I have four segments,

that guarantees to me that there will be at least one segment which will be error free.

There will be at least one segment which will be error free so which can match exactly.

This is great, because what I can do is I can then make sure that

if there is a approximate match with at most K errors then at least one of the segments,

at least one of them must exist in Q exactly.

Exactly. If it exists then that may be a match.

Then I need to go and basically do

the expensive processing to see whether that match does exist or not.

But if such a match does not exist,

what I know is that that cannot be,

there is no chance that there is a match with at most K errors.

So, if such an exact match does not

exist then I can filter out this query pattern and say,

sorry I can tell you that there is no chance that there is at

most K error match between Q and P. I can

tell you without actually paying that expensive process.

So as I said the filtering algorithm,

gives us the quick way to eliminate candidates that are guaranteed not to have a match.

To see whether there's a match if

basically the filtering algorithm say yes while there may be a match,

you actually have to pay the expensive algorithm.

You need to go back and implement the expensive algorithm

to actually check whether that match exists or not.

So that's why we call these filtering algorithms.

Okay, so we have seen one algorithm to basically solve this problem.

Let's see another algorithm.

So this algorithm solves the same problem as we discussed before.

We are given a string P of length N. We are given a pattern Q of

length M and we are trying to see if string P may contain an approximate match,

a match to Q with at most K errors.

So this is the same problem as the first filtering algorithm I solved.

In this case, however,

the process will be a little bit different.

Instead of segmenting the query pattern Q,

what we will do is we will operate on the string P on our data string.

So, let's basically see this with an example.

So we have string P and we have string Q.

So in general and Q with shorters than string P because they

are interested in containment which means they have a long string P

and a short string Q and say this basically the length of

this M the length of this is N. So what we will do is,

we will basically go and slide a window of length M on

our data string for each and every position.

And what we will do is we'll count symbols.

So let's assume that our query string is abac.

What this says that basically have two a's one b and one

c. So we have accounting vectors that counts the symbols.

What we will do is that as we move our window on the data symbol for each position,

we will count the number of symbols.

Instead of trying to match the data string with the Queta string.

What we will do is that for every position,

we will count how many a's there are how many b's there are how many c's there are.

So let's assume that somewhere here there is basically

a window with two a's one b and one c. Let's assume

that there is another window somewhere here again with two a's one b and one

c. Now I don't know whether this window is actually matching this,

I have no idea.

I also don't know whether this window is matching this or not.

I have no idea.

But what I know is that this window here which I say one a,

one b and two c's is not a match. I know of that.

So it means that basically if I am going to pay

the expensive price off trying to match the query string with a given window,

I only need to consider the window that is an exact match in the number of counts.

But remember in this problem we allow upto k errors.

Which means that basically say I allow up to one error