Batch Scheduling and Enterprise Orchestration with .Net and why is it so complex?

Logical Units of work

Microservices, Promises, are all examples of the idea of applications doing single tasks. Less experienced developers build complicated applications which contains a significant amount of code heavy steps, rather than breaking them into smaller steps. Smaller units, lets engaged parties (support staff, developers, testers, managers) understand the result. In many situations, the LUoW does result in slower systems - but in terms of support and understanding what has happened at each step, it is far easier to interpret. Quick example;

An SSIS Package which;

  • Copies a number of files to a folder.
  • Opens each file.
  • Imports them to a database.
  • Archives the files.
  • Sends an email to somebody.

OR

A series of processes/units of work which are managed;

  • Can copy a file to a folder.
  • Imports a file to a database.
  • Sends a notification.

Key advantages of LUoW

Suddenly, we know exactly at which point something failed and can - start from that point on. Either an automation agent or a human operator can take over. We have more flexible services capable of producing different outputs being controlled outside of the internal processing logic of the application.

When are Logical Units of Work not a good idea?

If there are high transactional volume, or transactional operations - LUoW are not ideal. If you need in-process and performance, LUoW isn't for you.

More about LUoW

Think about logical units of work as being synonymous with batch scheduling at the granular level.

A bit about my property platform, and why I developed an alternative for automated process automation

I have spent the last year completely redefining my property platform - findigl. To be honest, it has been an insane amount of work, and the goal has been to;

  • Speed up development by developing configurable applications.
  • Avoid using more traditional ETL tools because of the time taken to configure them.
  • Ensure there is a relatively small lead time in setting up new ETL processing.
  • Have a framework capable of delivering different solutions for different use cases and clients.

Fundamentally, there is not a team of 4-10 developers, a project manager, and business analysts to run the project.

At the end of this development phase, there will be;

  • A fairly lightweight set of tools to take data into the system without significant development.
  • Some mechanisms to aid with deployment and re-deployment of changes.
  • A lack of front-ends but then with a commitment to improve the configuration process by creating appropriate front-ends.
  • An awesome property platform - naturally.

A little about job scheduling and batch execution

One of the most esoteric of tasks for most involved in technology, is batch scheduling. Direct experience developing on Dollar Universe (challenging), Control-M (clunky as but the standard), and have worked with other frameworks.

What we are talking about is workflow automation. The reason we use batch scheduling with a GUI, is to allow operators who are less development minded to be able to run and manage batches. If you have worked in large enterprises including Banks or Insurance, it is common to find Batch scheduling frameworks including Control-M, Autosys, Dollar Universe. If you haven't worked in these industries - almost absolutely, you will be stuck with even worse Batch Execution frameworks such as Oracle or SQL Server Agent.

What is so painful about batch execution frameworks

  • Slow to develop.
  • Difficult to maintain.
  • Requires specialist knowledge.
  • Often requires duplication of effort.
  • Significantly slows down the release process.
  • Hard to dynamically schedule jobs. If you execute the same task with different parameters, you will need to create the same job manually.

Alternatives to batch scheduling frameworks in the .Net world

As, is often the case with .Net. You think there are a myriad of applications to do something and you find that there is almost nothing. For example, the only .Net Core CMS is Piranha at this moment in time. It is the same with Batch Scheduling.

I found;

Naturally, there are workflow automation engines, which are more GUI based. Each of these projects are very interesting, with a slightly larger team, these products add a lot of value. We could potentially write some dynamic discovery code and embed my tasks into them, but felt they are too heavy weight. Hangfire.IO just didn't find offered too much I couldn't do myself - except for the nice front ends.

What does most Batchflow automation applications seem to miss?

People don't seem to care that things take a really long time to do. Developing software, configuring applications takes a lot of time. Applications without front-ends are hard to conceptualise, applications which are more code based are hard to maintain and understand - become more esoteric and abstracted.

Effectively, batch scheduling frameworks seems to celebrate a lot of process. Developers hate them. 

So, the approach is to create more programmable solutions allowing developers to create their own wrappers to execute jobs. But we have to ask - what is the point of it? Why would we want to; have a bunch of tasks/jobs > Integrate a separate framework which will need to reference those jobs > compile and deploy a separate application which will need recompiling every time underlying tasks change.

The insanity of job scheduling

In many enterprises, experience has shown process examples of;

  1. Developers develop their applications.
  2. Create jobs within a job scheduling software.
  3. Release the artefacts to test.
  4. Have somebody reconfigure the parameters for test.
  5. Do the same for production. Bad news if you also have pre-prod.

A genuine use case of where the tail wags the dog with Batch Scheduling

A key challenge when, a developer, is joining a long established project. Rarely, do we find the solution - well architected. These types of applications are something that cannot be attributed to any one individual, more just a continual drive to; add more features, never address underlying design issues, cut corners, and the ultimate - avoid regression.

One of the key failures on a trade surveillance project, was to successfully respond to trade volume by calling more processing engines. Developers had to second guess how many engines to configure, which then had to be created as batch jobs. Some form of data driven scheduling determining the number of engines to run would have kept the internal complexity down and reduced the amount of batch job development.

My approach to job scheduling

So, I thought long and hard about how I would manage and automate a lot of complex processing. One of the immediate synergies I saw was between release management, configuration and deployment. They seem inherently intertwined. I have written applications to manage deploying applications for different configurations. When thinking about this, I installed Jenkins, looked at Team City, but still can't commit to them for this.

The goal is not to pretend that this framework could replace Jenkins or Team City. Instead, we are investigating whether process automation can be more lightweight with Continuous Integration being part of the development lifecycle.

The key principles of the Info Rhino job scheduling;

  • You should not need to manually configure every job.
  • Configuration can be copied after the core application artefacts are released.
  • Some jobs should be executable in parallel.
  • Jobs can be batched.
  • At some point, a front-end will help to simplify this for non-developers.
  • There should be a focus towards convention, automation and simplicity.

Thinking through how job scheduling can be enhanced?

It seemed obvious, that the same application can be executed in process with a different configuration. In addition, we may want to apply the same jobs per iteration of a job set. Here is what I mean as an example;

  • There are 50 downloads to do.
  • Each download needs a refresh and archive per set of downloads.
  • We may want to batch the downloads into ten sets of five parallel downloads.
  • Why manually configure 60 jobs?

A practical example of how my existing application works - IRProcessor Executor and IRProcessor

I decided it would be better to have a definition file (json) which is generated by discovering matched binaries (applications).

  1. Define an Executable Group by name.
  2. Create a set of executable definitions - for example, Importer.exe, Reset.ps1, Update.exe.
  3. State that we can have a maximum of 6 Importers running at the same time.
  4. Include wildcard matches.
  5. Point the application at a folder and discover jobs.

How it could work in theory?

Imagine, we have 30 Importer packages. We deploy them to a folder per package. We replicate the reset and update apps with a different configuration. After running the IRProcessorExecutor (we end up with 5 batches).

IRProcessor simply needs to know about where the location of the ApplicationExecution.json file is.

What does this mean in practice?

Suddenly, with minimum configuration, we can discover lots of jobs and execute them without having to spend lots of time setting up jobs and parameters. We can still use other tooling if required, and absolutely, we may decide to use a batch framework to orchestrate the Processor jobs, but we reduce the amount of manual job creation and deployment.

How do we maintain visibility over job success, job completion?

This is one of the interesting points. Batch frameworks tend to rely upon each executable or service called supplying a return code denoting success or failure for example. This really is a huge topic in itself. What is a failure to one person, may be acceptable to others. My personal belief is, we need to think in terms of responsibilities;

  • Allow each application/service to decide on abort/failure status.
  • Have a separate monitoring application to analyse logs from each application.
  • Centrally report these.
  • Produce appropriate notifications to operators/agents permitting decision making.

Many thanks for reading this article

Again - more of a work in progress, but am quite close to completing this for the findigl property platform and other small work being undertaken by Info Rhino. If you are interested in learning more about this, contact this blog or email solutions@inforhino.co.uk . I think this approach can really improve an organisations efficiency in scheduling its software.

Still haven't thought about putting this on Github yet. I am thinking about outsourcing some development of a front-end to help make this a more intuitive solution, so let me know if you are interested in getting involved.

Add comment