Wastholm.com

This is a list of (Fuzzy) Data Matching software. The software in this list is open source and/or freely available.

The term data matching is used to indicate the procedure of bringing together information from two or more records that are believed to belong to the same entity. Data matching has two applications: (1) to match data across multiple datasets (linkage) and (2) to match data within a dataset (deduplication). See the Wikipedia page about data matching for more information.

Similar terms: record linkage, data matching, deduplication, fuzzy matching, entity resolution

Suppose we want to combine a BERT-based named entity recognition (NER) model with a rule-based NER model built on top of spaCy. Although BERT's NER exhibits extremely high performance, it is usually combined with rule-based approaches for practical purposes. In such cases, what often bothers us is that tokens of spaCy and BERT are different, even if the input sentences are the same. For example, let's say the input sentence is "John Johanson 's house"; BERT tokenizes this sentence like ["john", "johan", "##son", "'", "s", "house"] and spaCy tokenizes it like ["John", "Johanson", "'s", "house"]. To combine the outputs, we need to calculate the correspondence between the two different token sequences. This correspondence is the "alignment".

The Configuration as Code plugin is an opinionated way to configure Jenkins based on human-readable declarative configuration files. Writing such a file should be feasible without being a Jenkins expert, just translating into code a configuration process one is used to executing in the web UI.

Simple command line tool for text to image generation using OpenAI's CLIP and Siren.

Audio on Unix is a little zoo, there are so many acronyms for projects and APIs that it's easy to get lost. Let's tackle that issue! Most articles are confusing because they either use audio technical jargon, or because they barely scratch the surface and leave people clueless. A little knowledge can be dangerous.

In this article I'll try to bridge the gap by not requiring any prerequisite knowledge while also giving a good overview of the whole Unix audio landscape. There's going to be enough details to remove mysticism (Oh so pernicious in web bubbles) and see how the pieces fit.

WHIP is a high performance web application server based on the excellent httpbeast and routing provided by nest with some additional optimizations.

WHIP is still in development and is not recommended for production use. Much is still missing or untested but for basic API use cases however, the performance numbers look pretty good so far.

Free and Open Source Machine Translation API, entirely self-hosted. Unlike other APIs, it doesn't rely on proprietary providers such as Google or Azure to perform translations.

An open source research project exploring the role of machine learning as a tool in the creative process.

Axe is an accessibility testing engine for websites and other HTML-based user interfaces. It's fast, secure, lightweight, and was built to seamlessly integrate with any existing test environment so you can automate accessibility testing alongside your regular functional testing.

The SOUL project is creating a new language and infrastructure for writing and deploying audio code. It aims to unlock improvements in latency, performance, portability and ease-of-development that aren't possible with the current mainstream techniques that are being used.

|< First   < Previous   11–20 (405)   Next >   Last >|