Wednesday, October 11, 2023

Mock data and APIs

Mocking APIs and synthetic mock data generation are invaluable techniques to speed up development. We recently used the Mockaroo platform and found it quite handy to generate dummy data and mock APIs. 

https://www.mockaroo.com/

IBM has also kindly released ~25M records of synthetic financial transacation data that can be used during application development or ML training.

https://github.com/IBM/TabFormer

Other examples of mock data generation tools are:

Leveraging Graph Databases for Fraud Detection

 There are many techniques for building Fraud Detection systems. It can be:

  • Rule Based (tribal knowledge codified)
  • Machine Learning (detect anomalies, patterns, etc.)
There is a third technique using Graph Databases such as Neo4J, TigerGraph or Amazon Neptune
A graph network can assist identify hidden aspects of transactions that would otherwise be missed just by looking at data in a relational table.

Lets consider the example of indenfying fraud in a simple financial transaction. Every financial transaction has thousands of attributes associated with is - e.g. amount, IP address, browser, OS, cookie data, bank, geo-location, card details, recepient,etc.
Using a graph database, we can build a graph network where each transaction is a node and the line connections (aka edges) represent the attributes of the transaction. The following article gives a good primer on how this kind of network would look like - https://towardsdatascience.com/fraud-through-the-eyes-of-a-machine-1dd994405e6e

Once the graph is created, there are many techniques that can be used to detect patterns and relationships between the different attributes. 
  • Link Analysis: This approach is used to detect unusual links between network items. In a financial network, for example, you may check for linkages between accounts engaged in fraudulent activities.
  • Anomaly detection: This approach is used to identify entities or transactions that differ from usual behaviour in a network. In a credit card network, for example, you may watch for transactions performed from strange areas or for abnormally big sums.
  • Cluster Analysis:  This technique is used to identify groups of entities in a network that are closely connected to each other. Clustering may also be used to surface commercial ties or social circles in a transaction banking graph.
Thus, by employing graph analytics, we may detect clusters and links in their data, revealing previously unknown possible fraud connections. More information on such techniques can be found on this blog: https://www.cylynx.io/blog/network-analytics-for-fraud-detection-in-banking-and-finance/

Because of their capacity to track complicated chains of transactions, graph databases are particularly useful in financial crime use cases and fraud detection graph analysis. Traditional RDBMS struggle with these sequence of connections because multiple recursive inner joins are necessary to accomplish this sort of traversal query in SQL, which is very challenging. 

A few articles that give good illustrations on this topic:

Friday, October 06, 2023

Defensive measures for LLM prompts

To prevent abusive prompts and prompt hacking, we need to leverage certain techniques such as Filtering, Post-Prompting, random enclosures, content moderation, etc.

A good explanation of these techniques is given here -- https://learnprompting.org/docs/category/-defensive-measures