Finding resilience in Project Arrow. Explained with a BONUS!

JESPROTECH

This is a collection and a merge between different videos I made about Project arrow. Specifically in this case the whole video is about how to achieve resilience in Project Arrow and what that can mean for us as developers. The presentation has a fast pace and the idea is that you come out of watching this video with a fundamental idea of what Project Arrow is, which role does it play in resilience nowadays and how maybe you can use it and test in your own adventures. There are a few playlists that made this video possible. The links are described bellow:  ---  Related playlists:  - https://www.youtube.com/playlist?list=PLw1b-aiSLRZiUc17iOhKZtP2kV9B_4txs - https://www.youtube.com/playlist?list=PLw1b-aiSLRZiby5qg4kFc3gZ-HyLXgrWW - https://www.youtube.com/playlist?list=PLw1b-aiSLRZgQj5A4KBQlih3upmpUJYN5 - https://www.youtube.com/playlist?list=PLw1b-aiSLRZgyWJ_4lSmsucUnxbBztDbc  ---  Related GitHub repositories:  - https://github.com/jesperancinha/from-paris-to-berlin-circuit-breaker - https://github.com/jesperancinha/from-paris-to-berlin-circuit-breaker  ---  Project's source code:  - https://github.com/jesperancinha/asnsei-the-right-waf  ---  As a short disclaimer, I'd like to mention that I'm not associated or affiliated with any of the brands eventually shown, displayed, or mentioned in this video.  ---  All my work and personal interests are also discoverable on other different sites:  - My Website - https://joaofilipesabinoesperancinha.nl/ - Reddit - https://www.reddit.com/user/jesperancinha - Credly - https://www.credly.com/users/joao-esperancinha/badges - Pinterest - https://nl.pinterest.com/jesperancinha/ - Instagram - https://www.instagram.com/joaofisaes/ - Facebook -  https://www.facebook.com/joaofisaes/ - Spotify -  https://open.spotify.com/user/jlnozkcomrxgsaip7yvffpqqm - Medium - https://medium.com/@jofisaes - Daily Motion - https://www.dailymotion.com/jofisaes - Tumblr - https://www.tumblr.com/blog/jofisaes  ---  If you have any questions about this video please put a comment in the comment section below and I will be more than happy to help you or discuss any related topic you'd like to discuss.  If you want to discover more about my open-source work please visit me on GitHub at:  - GitHub - https://github.com/jesperancinha

Transcript

00:00If you are interested in implementing an exponential back-off mechanism because you want to do

00:13multiple tries in your system for whatever it is that you want to do, you may be interested

00:17in knowing that Project Arrow has something straight out of the box that we can use for

00:22that.

00:23And I've implemented a test that will explore that.

00:26If we use this schedule, we can also call this exponential function that will allow

00:32us to configure a multiple-try mechanism.

00:34In this case, we say that the first waiting time for the first try is 100 milliseconds,

00:39and by default, if we go inside, we can see that it configures a factor of two for the

00:44subsequential tries.

00:46Then we make a call to this do-while, and this do-while will continue our repetition

00:52for as long as this condition stands.

00:55This condition will check for the output of every single of these repeats that will work

00:59here as an input for our check and see if it is greater than zero.

01:04And in that case, it will stop and the process will continue.

01:07At the same time that I'm running this, and in that case, it will stop and the process

01:11will continue.

01:12At the same time that I'm running this test, I'm also saving all the timestamps so then

01:16we can see at the end of the test how much time has actually waited between the multiple

01:20tries.

01:22So what it does here is it takes a book out of a shelf in this counter.

01:26So this is a counter of shelved books.

01:28In this case, it will be a count of five, and then it will decrement for each single

01:34try.

01:35And when it does have zero books on the shelves, it means that we can start a class.

01:39So if we run the test to see what the result is of this code, then we will see that it

01:44will remove all the books from the shelves, all five books from the shelves, and it will

01:48measure all the waiting times.

01:50We can see that they are not exactly double.

01:52They are measured in milliseconds, which can diverge a lot from the exact specific factor.

01:57We also need to take note that repeat is a higher order function that will take our function

02:02and call it later.

02:04But our function is an implementation of a suspend function, meaning that this will only

02:07work in a coroutine scope.

02:09We can implement the Saga design pattern in an orchestrated or in a choreographed way.

02:21But for this example, that doesn't matter.

02:22I just want to show you how to make a basic implementation of the Saga design pattern.

02:26Our example consists on implementing a system that is about someone going to the library

02:31and getting a book, as simple as that.

02:33And to do that, we need to first think about the entities that we need.

02:36We need a user, we need a reserve ticket, and we also need a book.

02:40For these, we need matching repositories.

02:43In this case, I'm not using a real database.

02:45I'm implementing the repositories myself with the help of HashMaps.

02:49Our database looks like this.

02:51It's a fake database, and this database is simply a set of HashMaps.

02:57We have a HashMap for the database books, where we only have two books in the library.

03:01We have a Reservations HashMap, which are the reservations currently in progress in

03:06the library.

03:07And the final one is the users, where I have placed myself in, and these are the users

03:13registered in the library.

03:14The way we get a book from this library is by means of the book service, which is located

03:19right down over here.

03:20And the way it works is by a successive call of different methods.

03:24And the first method that we call is pick a book.

03:26So this simulates the idea of a user going to a shelf and getting a book.

03:32We then register the book to be out of the shelf immediately in our repository.

03:36If something goes wrong in that process, then we have to restore the book, and that means

03:40that the field in shelf goes back to being true.

03:43The second step in our reservation, if the first one is successful, is to reserve the

03:47book.

03:48And in this specific case, how do we do that?

03:50We just add the book count to our user.

03:54So the reserve book does just that, and the return reservation, should something wrong

03:59happens during the time of the reservation of a book, it will decrease the book count.

04:05And then finally, we register our reservation by means of saving the reservation ticket

04:10that we get when we create a reserve ticket in the previous step.

04:14And if that is unsuccessful, then we need to remove the reservation from the database.

04:18In the three steps, I'm specifically only calling rollback methods to the method that

04:22had been called before.

04:23And this is, of course, taking into account that all of these methods will work successfully,

04:28or if one of them fails, all of these single rollbacks will be called successfully.

04:33And finally, we get our main method, which is get a book with receipt.

04:37So what this, in other words, means is that we pick up the book, we make a reservation,

04:41and we get a receipt out of our reservation.

04:43So the first thing to do is to call saga.

04:46And in saga, in the scope of the saga, we can then call the different methods using

04:50saga functions.

04:52And these functions are called to saga steps.

04:54The first saga step is implemented with a successful operation of picking a book with

04:59the ID of that book.

05:00If something fails, it will restore the book.

05:02As we have seen before, this will put the book back in the shelf.

05:05Then when that is successful, it will then make a book reservation, and it will search

05:10the book by ID, and it will add it to the user.

05:12So that means that the user will get an extra book.

05:15But no change will be made in the database when it comes to the reservation.

05:18If something fails, the reservation gets returned, and the book count of the user gets

05:23decreased by a unit of one.

05:24Finally, we make the ticket registration, where then our ticket gets saved to the database.

05:29And finally, if it fails, we unregister the ticket from the database, meaning that the

05:34database reservations gets that ticket reservation removed.

05:37To test this saga engine, which seems to be working fine, I have created a test.

05:42And that test we can find right over here in the resilient package in book lending test.

05:47And here we've got two tests.

05:48The first one is a successful test, and the second one is an unsuccessful test.

05:52The first test will only check if we can make the book reservation, and it will be a successful

05:57operation.

05:58In this case, we will make the reservation of book one to my user.

06:02And book one, as we have seen before, is the Silence of the Kittens.

06:06And so that's the book that I am going to reserve for myself right now.

06:09And we will see that the book is not in shelf anymore, that this is the book in the reservation

06:13ticket.

06:14And finally, we will check that the username matches my username, in this case, the name

06:18of the only user that's available in our database.

06:21And finally, the book count should be one.

06:23Finally, at the end of the test, I'm going to restore everything back using the restore

06:28methods that we have seen before.

06:29I restored the book one, I unregistered the receipt that we have made, and I returned

06:33my own reservation.

06:35So if we run this test, it should be successful, which it is.

06:39The second test is the fail test.

06:40And in the fail test, we are going to let all the operations run until the last one.

06:44And this is to make a more critical test of the Saga implementation.

06:47The first thing we are going to do is to create a book service, but not using the invoke function

06:51that it has.

06:52This one over here, instead, we are going to create our own using the general constructor

06:57of the book service.

06:59We simply need to create a book repository, a user repository.

07:02And for the reservation repository, we are just going to throw an exception, simulating

07:06a fail when we try to make the reservation of our book.

07:10That means that process saving it into the database.

07:13A Saga pattern implemented this way will throw the exception up to the higher levels.

07:17And that means that we will have to catch it over here.

07:19This is why I'm running this run catching.

07:21I don't care about the exception because I already know which one is it.

07:24I just care that when I get to the end of this test, that the book is back in the shelf,

07:28that the book count of my user is still zero, and that there are no reservations in the

07:34database.

07:35So let's run the test and see if indeed this test runs successfully.

07:40And remember, this is the failed test there.

07:42So everything's green.

07:43But maybe now you might have questions because this is a fine idea or no, maybe it's just

07:47not finding the ticket in the database.

07:49We can, however, put here a breakpoint and then check what happens in the actual database.

07:53So let's run the test again and see what the result is.

07:56We have reached a breakpoint right over here.

07:58And now let's go to our database right over here.

08:01And let's see what is happening to reservations.

08:04There's nothing there.

08:05So there's not a single reservation in the database.

08:08So that means that the book was returned.

08:10The book count is back to zero, and there are no reservations in the database.

08:13Just as we expected, a Saga design pattern to behave.

08:25A few years ago, I've created a project to investigate circuit breakers, and I implemented

08:29it with Resilience4j.

08:31The project is called From Paris to Berlin.

08:34Nowadays, there is an alternative called Project Arrow.

08:37And the way to implement circuit breakers in Project Arrow is a little bit different,

08:41and I will show you that in this example that I have created.

08:44This example is called UrsusService, and the UrsusService will use a circuit breaker, and

08:51it will simulate an imaginary situation where we are trying to call a bear.

08:56And there's a bear called Ursus that sits behind the service, and we need either to

09:02appease him or we will make him upset.

09:05But before we do that, let's recap what exactly is a circuit breaker.

09:09So a circuit breaker is a design pattern where we can implement a system that can have three

09:14states, a closed state, an open state, and a half open state.

09:18In a closed state, that only means that we can make our request successfully, there is

09:22no problem, and there's nothing particularly odd going on in our system.

09:27In an open state means that our system is now failing, and every request is going to

09:32fail fast.

09:33The half open state is a bit different, and this is where there may be some confusion.

09:38The half open state means that a certain timeout has expired, and now we enter a test mode.

09:45And that means that we will need to get a successful request so that the circuit can

09:49go back to being closed, so that we can send successful requests again.

09:53If not, all the requests that are being sent together with this request simultaneously,

10:00all the other requests will fail, and only one will succeed.

10:04In a half open state, the timeout can increase, and with this particular implementation using

10:08Project Arrow, it will increment in an exponential back off kind of fashion.

10:14Having that in mind, let's have a look at our example.

10:16So we first create a circuit breaker, and we are using this circuit breaker to call

10:20our bear.

10:21But before we do that, let's check out the properties of our circuit breaker.

10:24The first property is the open strategy.

10:26The open strategy in this case means that we are specifying that the maximum count of

10:31failures that we allow for our system to react is one.

10:36That means that after the first try, the system will go from closed to open.

10:41Then we have this property called reset timeout.

10:43The reset timeout is the amount of time that needs to go by when the system goes from closed

10:47to open.

10:48That is the point where we can send our test request, see if it goes successful, make our

10:52system go back to a closed state, or fail, and then let that process happen again.

10:58Let that timeout go by, only that not with two seconds, but with the exponential back

11:03off factor of two.

11:05So that means the first timeout will be two seconds, and the second timeout will be four.

11:09The third time will be eight seconds, and we will see that in a minute.

11:12And finally, we configure the reset timeout to 60 seconds.

11:16So this means that if we get to a maximum reset timeout of 60 seconds, the exponential

11:20back off factor will have no effect anymore, and every single reset timeout will be of

11:2560 seconds.

11:26So let's start with our circuit breaker, and test how this works.

11:30The first thing is to say hello to a bear.

11:32So to do that, we can go jump right in over here, and just say hello to the bear.

11:38But the bear is a weird one, and this bear is very irritated.

11:43And so we just say hello, and the first thing that the bear says without any kind of introduction

11:47is just, hi there, I am Ursus, oh my god, are you saying that my code is the best?

11:53So this is the first reaction the bear has.

11:55We, because we are talking to him, we kind of don't like this kind of introduction, so

11:59we start saying, oh, maybe your code isn't that great.

12:03And so when we doubt his experience, the bear is going to say, I have tested it, it works

12:09perfectly, I am an angry bear, and my code is the best.

12:12When the bear does this, it throws an exception, so it gets an overdrive, and we cannot reach

12:16him.

12:17So this will fail.

12:18This will fail because it uses this protect or throw.

12:21You probably noticed this before.

12:23We are using this protect or throw to test whatever we are putting over here.

12:26But let's go back to this situation.

12:28We have now created one fail.

12:30But let's try again.

12:31Let's make the bear really angry and doubt his experience once again.

12:34Once we do that, we should expect here the circuit to go from closed to open.

12:39And that means that the reset timeout kicks in.

12:43And so what we do immediately after that, without letting the two seconds go by, we

12:49run this protect either.

12:51And the protect either stops an exception being thrown.

12:54It uses an either, and that means that on the left, we get an exception, and on the

12:57right, we get a unit, which should be the successful case.

13:02And we should observe here that the circuit is in an open state.

13:06And the bear is now saying, I am trying to calm down.

13:08Are you downing my experience?

13:10So still it overdrive, but now it's not enough to break the system.

13:13So the system is getting back to its normal position.

13:15Right afterwards, the bear says, me, the Ursus, is going to sleep.

13:19I need to calm down, but I have tested everything and everything works.

13:22And we are going to see here also in the circuit breaker state that it is still open.

13:28But now we are causing a delay of two seconds.

13:30And now Ursus is chilling out and is kind of calming down.

13:34So after two seconds, the bear is trying to calm down, but no successful request has happened

13:40yet.

13:41In spite of the time that's gone by, the state remains open.

13:44It will only be closed after the first successful request and after this timeout has passed,

13:49has expired.

13:50But now we're going to do something bad.

13:52And that is, we're going to doubt his experience again.

13:55So the bear is going to be upset again.

13:56I have tested it, it works perfectly, I am an angry bear, and my code is the best.

14:00And so we are now expecting the delay to be four seconds.

14:05So we make a two times delay of two seconds.

14:08You will see in a moment why am I using two delays.

14:10This is to make a small test to see another behavior of the circuit breaker in a moment.

14:14But for now, let's now try to calm down our bear.

14:17Now he responds, I am trying to calm down.

14:19And then we are going to have a look at the state.

14:21So at this point, we are supposed to observe an open state still.

14:24However, the timeout has now expired.

14:27And that means that our circuit breaker is open for other requests.

14:32So our bear is open to another test to see if it is still open.

14:35Our bear is open to another test to see if we can calm him down.

14:39So in our final attempt, we get this response from the bear.

14:41I cannot sleep, but I can calm down, or at least pretend to.

14:45And at this point, when we send this request, we should see here in this state a half open

14:51state.

14:51That is, after the first successful request, the circuit breaker changes state from open

14:56to half open.

14:58Finally, when we leave this protector throw, at the end, we are supposed to see that the

15:02state is back to closed.

15:04And from here onwards, we can now send our successful requests as long as we don't make

15:09a fail.

15:09So this is a resilient way to create a system.

15:12This is not the actual application of a circuit breaker.

15:15This is an example to see how it works.

15:16This can work with a lot of requests.

15:18But now let's run and see if what I just said matches reality.

15:23So let's run the Ursus service.

15:25And now we are seeing here, hi there, I am Ursus, all of that text.

15:28And then finally, we see that the state is closed.

15:31So we can see here the results.

15:32In the first case, we see, hi there, I'm Ursus, oh my god.

15:36And then we see that the result is that the state is closed.

15:40So nothing has happened here.

15:41One successful request, our bear is still happy.

15:43Then we make him upset.

15:45And there are two wrong requests.

15:48So in this case, there are two fails.

15:49And that means that our circuit will go from state closed to open.

15:53While the bear is trying to calm down, we realize that we also cannot make contact with

15:58him.

15:58And so when we make our protect either call, it returns none.

16:02And while the delay doesn't happen, the bear is still upset, but it's now calming down.

16:06So here the bear is trying to calm down.

16:08And although there are no problems with our specific request, we still get an error because

16:14the circuit breaker state is still open.

16:16In this one, the timeout has already passed.

16:18The two seconds have already gone by.

16:20And we see already something curious is that the reset timeout is still set to two seconds.

16:25So at this point, we would be able to say something to appease the bear.

16:28But we don't do that.

16:29We make him angry.

16:30And so therefore, we doubt his experience once again.

16:34He gets angry.

16:34And of course, the state remains open.

16:37At this point, the four seconds have already gone by.

16:39And when we finally say something to appease the bear, he will respond, I cannot sleep

16:44and the rest of it.

16:45And finally, we see that we get a half open state.

16:49So this means that we've got a successful request after the timeout has passed from

16:53the first transition from closed to open.

16:56And finally, of course, the circuit breaker status goes back to a closed state where we

17:00can now send our requests without any problem.

17:03And the bear finally gets happy with his interpretation of the story.

17:08And finally, you accept that the rest of it.

17:11But we still have that example that I wanted to show you.

17:13The question you may be having is what happens if we don't wait here for the four seconds

17:19to go by.

17:20And we try to make this protect or throw at the last point when the bear starts to actually

17:26calm down.

17:26Let's see what happens.

17:28An exception has been thrown.

17:29And this is the whole point of circuit breaker.

17:31In this case, an exception has been thrown because we are trying to make a request within

17:37that timeout.

17:38So that means that our circuit is open and it's not welcome to accept a test request.

17:42So we can interpret this as sending a request too soon before the circuit closes again.

17:47This is important because then we can prevent a lot of wrong requests to generate problems

17:51within our system that the system needs to process.

17:54Circuit breakers prevent our system to go into overdrive when receiving wrong requests

17:58and processing those requests that are prone to error.

18:01It identifies an event where massive requests are being sent that are prone to error and

18:05stops that process from continuing further, therefore protecting our application.

18:10And by the way, a fun fact, you probably realized that Ursus has something to do with bears.

18:15Ursus is the Latin word for bears.

18:22At this point in time, you probably have already heard of Project Arrow.

18:25And if you haven't, please bear with me because I will explain to you a bit of what it does.

18:30Project Arrow allows us to simplify our Kotlin code.

18:33And one of the things that it provides us with is this higher order function called

18:38nullable.

18:39And nullable gives us this nullable race which provides us with a monadic context.

18:46And this monadic context allows us to use bind in our variables.

18:51Our variables, none of them are monads per se, but bind in this case will test if our

18:58variables are null or not.

19:00In case they are null, what will happen is that the function execution will return to

19:05the call point with a null value instead of a null pointer exception.

19:09Not always we want to throw an exception if we have a null value.

19:13And not always we want to use non-nullable variables.

19:18Arrow solves that for those specific cases.

19:21And if we look at the logs of one execution of this code, we will see that in the first

19:27execution, everything runs okay because I'm not specifying as parameter a simulation of

19:33a fail.

19:34And so we will get all the logs correctly.

19:37In the second log, we get nothing.

19:40We simply get an empty array.

19:42That empty array comes from the fact that if we ask to simulate one fail, what will

19:49happen is that we will use ensure.

19:51And this is another function that will check that whatever we pass through as a parameter

19:57will be true or not.

19:58And in case that it's not true, then it will fail and then it will return a null.

20:03And this null will then be transformed to an empty list.

20:09And finally, in the third execution, we ask it to fail for the second case.

20:14And in the second case, what will happen is that we will check if our list is...

20:20In the second case, we seem to be getting an empty array.

20:23This is actually an empty list that we get.

20:26If the function returns a null value at this point.

20:30So then we do an or empty and therefore we get an empty list.

20:35And in the third case, we do see that the first bit of the execution went well.

20:39We do get a log with the output of all of the contents of the list.

20:43We see it as an array in the logs.

20:45And the last log line is an empty list that gets transformed out of the return null.

20:51How does the null gets generated?

20:54It's simple.

20:56The second fail is generated if our list isn't smaller or equal to a thousand.

21:03And therefore, when we do this take if, the return result will be null,

21:09which then via bind will shortcut the function at this point

21:14to return a null value up to here, which then gets converted to an empty value.

21:26If you're implementing a pure function in Kotlin,

21:28meaning a function that does not have any side effects

21:31and produces the same result after computations with the same input,

21:34then you might be interested in knowing that Project Arrow

21:37has an implementation of a process called memoization.

21:41And to do that, we only need to apply an extension function called memoize

21:45to our original function.

21:46I have an example of that called square root of.

21:49And square root of simply calculates, of course, the square root of a number

21:53with a sleep of one second to simulate a delay in calculations.

21:57And to test this method, what I have done is a test

22:01that simply makes three calls to the memoize version of our function

22:06and calls it three times.

22:08To get our first result, we make a call to our transformed function

22:11with number 1764 as an argument.

22:14The same for the second result.

22:16For result three, we make a nested call of the same function

22:21with that number, which will result in the first call to number 1764.

22:28And therefore, the same subsequent call will be made.

22:31As a result, this function will make two actual calls,

22:35which take one second.

22:36And the other calls are all already cached in.

22:40So this means that the total time of execution of this unit test

22:45should be around two seconds.

22:46Let's test if that's true.

22:47So we can see over here that the tests are successful.

22:50And we can see that the three results are printed out into the console

22:53with number, of course, 42.

22:56And the operation took about two seconds to be complete.

23:01But however great memoization is and how good this is implemented in Arrow,

23:05we need to take note that there seems to be no eviction policy

23:09yet available in Project Arrow for this kind of memoization.

23:13And therefore, we have to be careful if we want to use it.

23:16Make sure that your function is pure,

23:17that the results are all the same for the same inputs,

23:21and that the computations don't change.

23:48As a short disclaimer, I'd like to mention that I'm not associated or affiliated

23:51with any of the brands eventually shown, displayed, or mentioned in this video.

Category

Transcript

Recommended