Thursday, August 27, 2020

Automating API creation over existing data-sources

 If you need to quickly build REST APIs over existing data in excel sheets, RDBMS, NoSQL data stores; then the DreamFactory toolkit is immensely helpful. 

DreamFactory is available under the Apache 2 license and you can host it on your public/private cloud platforms. Just with a few clicks, you will have a complete secure API service available to perform CRUD operations on your data. DreamFactory runs on the Apache LAMP stack ((Linux, Apache, MySQL, PHP/Perl/Python). 

Besides generating the APIs, DreamFactory can also dynamically generate a Client SDK for HTML5 frameworks like jQuery, AngularJS, and Sencha, and a code library for native clients like iOS, Windows 8, and Android.

DreamFactory also has a toolkit to convert your SOAP webservices into REST APIs - https://www.dreamfactory.com/resources/video/how-turn-any-soap-web-service-rest-api/

It also has a cool feature called DataMesh wherein we can create virtual foreign key relationships between tables in the same database or between completely different databases without altering your schema or writing any code. Create, read, update, or delete objects and related objects with a single API call.

DreamFactory also automatically creates Swagger API documentation and has basic API Management capabilities built in (e.g. rate limiting)

Displaying PDF, Word, PPT docs on web pages

 If you need to display documents in PDF, Word or PPT formats on your web application, then the following JavaScript toolkit will come to your rescue. 

ViewerJS - https://viewerjs.org/

ViewerJS uses 2 other JS libraries behind the scenes - PDF.js (http://mozilla.github.io/pdf.js/) and WebODF (https://webodf.org/)

Any document that is following the Open Document Format (ODF) can be rendered using ViewerJS. All Office 365 files now follow the ODF format. 

Wednesday, August 12, 2020

Ruminating on the hype around hyper-automation

 So what exactly is hyper-automation?

In simple terms 'Hyper-Automation' is going beyond plain RPA. In RPA, the bot just mimics what a human would do. 

But in hyper-automation, along with the RPA tool, we use other technologies like AI/ML, workflow engine, rules engine to automate E2E process flows. Thus we make the business process more streamlined and robust ---- essentially better than what the human was doing. 

Even Process Discovery Tools are considered to be an important part of the hyper-automation journey. Process Discovery Tools enable us to understand the current state business processes and offers suggestions on streamlining it. It does not make sense to automate a fragmented process, but we should first streamline the process, remove redundancies and then automate it. 

Example - A bot reading an unstructured email and understanding its intent, extracting relevant data points from it (AI) and completing the requested transaction via a desktop app (RPA). 


Wednesday, June 24, 2020

Ruminating on Asynchronous Request-Reply pattern over HTTP

Quite often, a HTTP request would entail processing on some back-end that communicates via messaging. In such cases, do we keep the server thread waiting for a response on the queue? or do we have a better design pattern to handle such scenarios.

The following article on Microsoft illustrates a good pattern for Asynchronous Request-Reply pattern over HTTP - https://docs.microsoft.com/en-us/azure/architecture/patterns/async-request-reply

Excerpts from the article: 
  1. The client sends a request and receives an HTTP 202 (Accepted) response.
  2. The client sends an HTTP GET request to the status endpoint. The work is still pending, so this call also returns HTTP 202.
  3. At some point, the work is complete and the status endpoint returns 302 (Found) redirecting to the resource.
  4. The client fetches the resource at the specified URL.


Sunday, May 31, 2020

Ruminating on Mutual Authentication

In mutual authentication, both the server as well as the client have digital certificates and authenticate each other. If both the server and client are using CA signed certificates, then everything would work OOTB and there would be no need to import any certificates. This is because, both the server and client default trust stores would have the root certificates of most CAs.

But during testing and in lower environments, teams often use self-signed certificates. To enable mutual authentication using self-signed certificates, we have 2 options. 
  • Peer-2-Peer: Create a client certificate for each agent. Import this cert into the trust store of the server. 
  • Root cert derived client certifications:  Create a client root certificate and using this root certificate, create/derive client certs for each agent. Then you just have to import the client root certificate into the server trust store ( and not of all the agents).    

Thursday, May 28, 2020

Ruminating on Azure RTOS

Microsoft acquired ThreadX from Express Logic and re-branded it as Azure RTOS. ThreadX was already a popular RTOS that is being used by more than 6.5B devices worldwide.
** Gartner predicts that by 2021, one million new IoT devices will come online every hour of every day. In 2019, there were approx 27B IoT devices.

Besides ThreadX, Azure RTOS has also packaged other modules such as GuiX, FileX, NetX, USBX, etc. 

The below link points to an interesting conversation with Bill Lamie - founder of ThreadX. 

Jotting down some interesting points below. 
  • The most important characteristic of an RTOS is size. RTOS size is typically in KB, whereas general purpose OS is in MB or GB. Because of this size, RTOS can be used in the smallest of devices...even battery powered ones - e.g. fitness wearables, medical implants, etc. So essentially RTOS is great for constrained/smaller devices. 
  • RTOS is "real-time" because the OS responds to real time events in a deterministic time frame. An RTOS guarantees that certain actions can happen on IoT devices within defined time limits - a feature called as determinism. 
  • The size of Azure RTOS can scale down all the way to 2KB. A cloud connected RTOS would take 50KB.
  • Azure RTOS also brings in best-of-class security with multiple security certifications. 
  • The complete source code of Azure RTOS is open-source and available on GitHub at https://github.com/azure-rtos
Before the acquisition of Express Logic, Microsoft had an offering called Azure Sphere OS that was positioned as an OS for edge devices. Azure Sphere is more secure and is Linux kernel based, but cannot run on highly constrained devices. Also it has a Linux kernel and is not an RTOS and hence cannot provide deterministic execution. 

Though Microsoft is currently stating that Azure RTOS and Azure Sphere are complementary, only time will tell which OS the industry adopts. 

Saturday, April 18, 2020

Performance instrumentation via DataDog

Recently my team was looking for a solution to implement custom metrics in Java microservices that would then ultimately be fed to DataDog. We explored the following multiple options to add custom performance instrumentation.
  • Using StatsD: StatsD is a  network daemon that runs on the Node.js platform and listens for statistics, like counters and timers, sent over UDP or TCP and sends aggregates to one or more pluggable backend services (e.g., Graphite, DataDog). StatsD is very popular and has become a de facto standard for collecting metrics. Opensource libraries are available in all popular languages to define and collect metrics. More information on StatsD can be found here - https://github.com/statsd/statsd
  • Using DogStatsD: DogStatsD is a custom daemon by DataDog. You can consider it as an extension over StatsD with support for many more metric types. This daemon needs to be installed on the node where you need to collect metrics. If a DataDog agent is already installed on the node, then this daemon is started by default. DataDog has also provided a  java library for interfacing with DogStatsD. More information can be found here - https://docs.datadoghq.com/developers/dogstatsd/
  • Using DataDog HTTP API: DataDog also exposes a REST API that can be used to push metrics to the DataDog server. But it does not make sense to push each and every metric using HTTP. We would need some kind of aggregator on the client side that would collate all data for a time period and then make a HTTP call to DataDog server. https://docs.datadoghq.com/api/
  • Using DropWizard bridge: If you are already using the popular DropWizard metrics library, then the developers at Coursera have created a neat opensource library that acts as a bridge between DropWizard and DataDog - https://github.com/coursera/metrics-datadog
  • Using Micrometer Metrics Facade: If you are using Spring Boot, then this is the best seamless option available for you. Spring Boot Actuator has default support for Micrometer facade library and already provides a DataDogRepository implementation that can be used to push metrics to DataDog. The advantage of using Micrometer facade library is that we can switch to any other metrics backend easily - e.g. switching from DataDog to AWS CloudWatch. Also we can have composite repository wherein we can publish the same metrics to multiple backends. 
We finally decided to use the Micrometer metrics library, as all our microservices were on Spring Boot. Spring Boot 2 has many OOTB metrics configured in micrometer that are of tremendous value for DevOps teams - https://spring.io/blog/2018/03/16/micrometer-spring-boot-2-s-new-application-metrics-collector

Behind the scenes, the micrometer DataDog repository uses the DataDog HTTP APIs to push metrics to the server. There is a background thread that collects/aggregates data and then makes a periodic call to the DataDog server. Perusing the following source code files would give a good overview of how this works: 
https://git.io/JfJDC
https://git.io/JfJD8

To configure DataDog in Spring Boot, you just need to enable the following 2 properties. 
management.metrics.export.datadog.api-key=YOUR_KEY //API key 
management.metrics.export.datadog.step=30s //the interval at which metrics are sent to Datadog

It is also very easy to implement micrometer code in Spring Boot. Sample code below: 

Wednesday, April 15, 2020

Kafka poll() vs heatbeat()

In older versions of Kafka, the consumer was responsible for polling the broker frequently to prove that it is still alive. If the consumer does not poll() within a specified time-limit, then the broker considers that consumer to be dead and starts re-balancing the messages to other consumers.

But in latest versions of Kafka Consumer, a dedicated background heartbeat thread is started. This heartbeat thread sends periodic heartbeats to the broker to say -"Hey, I am alive and kicking!..I am processing messages and will poll() soon again".

Thus the newer versions of Kafka decouple polling functionality and heartbeat functionality. So now we have two threads running, the heartbeat thread and the processing thread (polling thread).
The heartbeat frequency is defined by the session.timeout.ms property (default = 10 secs)

Since there is a separate heartbeat thread now, the authors of Kafka Consumer decided to set the default for the polling timeout as INTEGER_MAX. (attribute: max.poll.interval.ms)
Hence no matter how long the processing takes (on the processing/polling thread), the Kafka broker will never consider the consumer to be dead. Only if no poll() request is received after INTERGER_MAX time, then the consumer would be considered dead.
.
Caveat: If your processing has a bug - (e.g. infinite loop, processing has called a third-party webservice and is stuck, etc.), then the consumer will never be pronounced dead and the messages will start getting piled up in that partition. Hence, it may be a good idea to set a realistic time for the polling() interval, so that we can rebalance the messages to other consumers. 

The following 2 stackoverflow discussions were extremely beneficial to us to help us understand the above.

https://stackoverflow.com/questions/47906485/max-poll-intervals-ms-set-to-int-max-by-default
https://stackoverflow.com/questions/39730126/difference-between-session-timeout-ms-and-max-poll-interval-ms-for-kafka-0-10-0


Wednesday, January 22, 2020

Converting Java libraries to .NET DLLs

If you have a nifty java library that you love and would want to use it in your .NET program, then please have a look at this useful toolkit called IKVM.NET - https://www.ikvm.net/uses.html

ikvmc -target:library {mylib.jar} ------- will create mylib.dll

Java libraries for SSH and Powershell automation

If you are doing some basic automation and want to execute commands on Linux or Windows, then the following open source libraries would help.

JSCH : http://www.jcraft.com/jsch/
JSch is a pure Java implementation of SSH2 and once you connect to a Linux server, you can execute all commands. A good tutorial is available here - https://linuxconfig.org/executing-commands-on-a-remote-machine-from-java-with-jsch

jPowerShell:  https://github.com/profesorfalken/jPowerShell
This is a simple Java API that allows to interact with PowerShell console. Sample code below: