Metablog

A Case Study of LLVM-Based Analysis for Optimizing SIMD Code Generation [pdf]

This paper presents a methodology for using LLVM-based tools to tune theDCA++ (dynamical clusterapproximation) application that targets the new ARMA64FX processor. The goal is to describethe changes required for the newarchitecture and generate efficient single instruction/multip … | Continue reading

@arxiv.org | 2 years ago

Neural Network Training with Highly Incomplete Datasets

Neural network training and validation rely on the availability of largehigh-quality datasets. However, in many cases only incomplete datasets areavailable, particularly in health care applications, where each patienttypically undergoes different clinical procedures or can drop o … | Continue reading

@arxiv.org | 2 years ago

Productivity, Portability, Performance: Data-Centric Python

Python has become the de facto language for scientific computing. Programmingin Python is highly productive, mainly due to its rich science-orientedsoftware ecosystem built around the NumPy module. As a result, the demand forPython support in High Performance Computing (HPC) has … | Continue reading

@arxiv.org | 2 years ago

Mission Impossible: Securing Master Keys

Securing a secret master key is a non-trivial task, we even argue it isimpossible to fully secure it, hence we must make it as difficult as possiblefor any powerful adversary to steal or use the key. We introduce the reader tointeresting cryptography which is starting to get more … | Continue reading

@arxiv.org | 2 years ago

Reinforcement Learning as One Big Sequence Modeling Problem

Reinforcement learning (RL) is typically concerned with estimatingsingle-step policies or single-step models, leveraging the Markov property tofactorize the problem in time. However, we can also view RL as a sequencemodeling problem, with the goal being to predict a sequence of a … | Continue reading

@arxiv.org | 2 years ago

Toward Efficient Interactions Between Python and Native Libraries

Python has become a popular programming language because of its excellentprogrammability. Many modern software packages utilize Python for high-levelalgorithm design and depend on native libraries written in C/C++/Fortran forefficient computation kernels. Interaction between Pyth … | Continue reading

@arxiv.org | 2 years ago

Scrooge Attack: Undervolting ARM Processors for Profit

Latest ARM processors are approaching the computational power of x86architectures while consuming much less energy. Consequently, supply followsdemand with Amazon EC2, Equinix Metal and Microsoft Azure offering ARM-basedinstances, while Oracle Cloud Infrastructure is about to add … | Continue reading

@arxiv.org | 2 years ago

Multi-Horizon Forecasting for Limit Order Books

We design multi-horizon forecasting models for limit order book (LOB) data byusing deep learning techniques. Unlike standard structures where a singleprediction is made, we adopt encoder-decoder models with sequence-to-sequenceand Attention mechanisms, to generate a forecasting p … | Continue reading

@arxiv.org | 2 years ago

My Favorite Integer Sequences (2002) [pdf]

This paper gives a brief description of the author's database of integersequences, now over 35 years old, together with a selection of a few of themost interesting sequences in the table. Many unsolved problems are mentioned. | Continue reading

@arxiv.org | 2 years ago

Understanding the growth of the Fediverse through the lens of Mastodon

Open-source, Decentralized Online Social Networks (DOSNs) are emerging asalternatives to the popular yet centralized and profit-driven platforms likeFacebook or Twitter. In DOSNs, users can set up their own server, or instance,while they can actually interact with users of other … | Continue reading

@arxiv.org | 2 years ago

Mathematics Is Physics

In this essay, I argue that mathematics is a natural science---just likephysics, chemistry, or biology---and that this can explain the alleged"unreasonable" effectiveness of mathematics in the physical sciences. The mainchallenge for this view is to explain how mathematical theor … | Continue reading

@arxiv.org | 2 years ago

Energy-Based Models for Code Generation Under Compilability Constraints

Neural language models can be successfully trained on source code, leading toapplications such as code completion. However, their versatile autoregressiveself-supervision objective overlooks important global sequence-level featuresthat are present in the data such as syntactic co … | Continue reading

@arxiv.org | 2 years ago

PathQuery, Google's Graph Query Language

We introduce PathQuery, a graph query language developed to scale withGoogle's query and data volumes as well as its internal developer community.PathQuery supports flexible and declarative semantics. We have found that thisenables query developers to think in a naturally "graphy … | Continue reading

@arxiv.org | 2 years ago

A Video Conferencing Manipulation-Detection System

The last-generation video conferencing software allows users to utilize avirtual background to conceal their personal environment due to privacyconcerns, especially in official meetings with other employers. On the otherhand, users maybe want to fool people in the meeting by cons … | Continue reading

@arxiv.org | 2 years ago

An Incentive Mechanism for Trading Personal Data in Data Markets

With the proliferation of the digital data economy, digital data isconsidered as the crude oil in the twenty-first century, and its value isincreasing. Keeping pace with this trend, the model of data market tradingbetween data providers and data consumers, is starting to emerge a … | Continue reading

@arxiv.org | 2 years ago

Data Poisoning Won't Save You from Facial Recognition

Data poisoning has been proposed as a compelling defense against facialrecognition models trained on Web-scraped pictures. By perturbing the imagesthey post online, users can fool models into misclassifying future(unperturbed) pictures. We demonstrate that this strategy provides … | Continue reading

@arxiv.org | 2 years ago

Multi-task curriculum learning in a complex, hard-exploration domain: Minecraft

An important challenge in reinforcement learning is training agents that cansolve a wide variety of tasks. If tasks depend on each other (e.g. needing tolearn to walk before learning to run), curriculum learning can speed uplearning by focusing on the next best task to learn. We … | Continue reading

@arxiv.org | 2 years ago

Timekeeping Infrastructure for the Catalina Sky Survey

Time domain science forms an increasing fraction of astronomical programs atmany facilities. Synoptic and targeted observing modes of transient, varying,and moving sources rely on precise clocks to provide the underlying time tags.Often precision is mistaken for accuracy, or the … | Continue reading

@arxiv.org | 2 years ago

Charformer: Fast Character Transformers via Gradient-Based Subword Tokenization

State-of-the-art models in natural language processing rely on separate rigidsubword tokenization algorithms, which limit their generalization ability andadaptation to new settings. In this paper, we propose a new model inductivebias that learns a subword tokenization end-to-end … | Continue reading

@arxiv.org | 2 years ago

The power of quantum neural networks (2020)

Fault-tolerant quantum computers offer the promise of dramatically improvingmachine learning through speed-ups in computation or improved modelscalability. In the near-term, however, the benefits of quantum machinelearning are not so clear. Understanding expressibility and traina … | Continue reading

@arxiv.org | 2 years ago

A Pure HTTP/3 Alternative to MQTT-over-QUIC in Resource-Constrained IoT

In this letter, we address the issue of scalable and timely dissemination ofinformation in resource-constrained IoT networks. The scalability is addressedby adopting a publishsubscribe architecture. To address the timelydissemination, we propose an HTTP/3 (H3) publish-subscribe s … | Continue reading

@arxiv.org | 2 years ago

Breaking the O(n)-Barrier in the Construction of Compressed Suffix Arrays

The suffix array, describing the lexicographic order of suffixes of a giventext, is the central data structure in string algorithms. The suffix array of alength-$n$ text uses $Θ(n \log n)$ bits, which is prohibitive in manyapplications. To address this, Grossi and Vitter [STOC 20 … | Continue reading

@arxiv.org | 2 years ago

Information Theory: A Tutorial Introduction

Shannon's mathematical theory of communication defines fundamental limits onhow much information can be transmitted between the different components of anyman-made or biological system. This paper is an informal but rigorousintroduction to the main ideas implicit in Shannon's the … | Continue reading

@arxiv.org | 2 years ago

Hypergraph expanders of all uniformities from Cayley graphs

Hypergraph expanders are hypergraphs with surprising, non-intuitive expansionproperties. In a recent paper, the first author gave a simple construction,which can be randomized, of $3$-uniform hypergraph expanders withpolylogarithmic degree. We generalize this construction, giving … | Continue reading

@arxiv.org | 2 years ago

Fast Text-Only Domain Adaptation of RNN-Transducer Prediction Network

Adaption of end-to-end speech recognition systems to new tasks is known to bechallenging. A number of solutions have been proposed which apply externallanguage models with various fusion methods, possibly with a combination oftwo-pass decoding. Also TTS systems have been used to … | Continue reading

@arxiv.org | 2 years ago

OpenML-Python: An Extensible Python API for OpenML

OpenML is an online platform for open science collaboration in machinelearning, used to share datasets and results of machine learning experiments.In this paper we introduce OpenML-Python, a client API for Python, opening upthe OpenML platform for a wide range of Python-based too … | Continue reading

@arxiv.org | 2 years ago

Gender Differences in Scientific Careers: A Large-Scale Bibliometric Analysis

We present a large-scale bibliometric analysis of gender differences inscientific careers, covering all scientific disciplines and a large number ofcountries worldwide. We take a longitudinal perspective in which we trace thepublication careers of almost six million male and fema … | Continue reading

@arxiv.org | 2 years ago

Beer mats make bad Frisbees

In this article we show why flying and rotating beer mats, CDs, or other flatdisks will eventually flip in the air and end up flying with backspin, thus,making them unusable as frisbees. The crucial effect responsible for theflipping is found to be the lift attacking not in the c … | Continue reading

@arxiv.org | 2 years ago

Unsupervised discovery of features, patterns, via variational autoencoders

Recent advances in scanning tunneling and transmission electron microscopies(STM and STEM) have allowed routine generation of large volumes of imaging datacontaining information on the structure and functionality of materials. Theexperimental data sets contain signatures of long- … | Continue reading

@arxiv.org | 2 years ago

Real-Time Neural Radiance Caching for Path Tracing

We present a real-time neural radiance caching method for path-traced globalillumination. Our system is designed to handle fully dynamic scenes, and makesno assumptions about the lighting, geometry, and materials. The data-drivennature of our approach sidesteps many difficulties … | Continue reading

@arxiv.org | 2 years ago

LegoFormer: Transformers for Block-by-Block Multi-View 3D Reconstruction

Most modern deep learning-based multi-view 3D reconstruction techniques useRNNs or fusion modules to combine information from multiple images afterencoding them. These two separate steps have loose connections and do notconsider all available information while encoding each view. … | Continue reading

@arxiv.org | 2 years ago

Multiplying Matrices Without Multiplying

Multiplying matrices is among the most fundamental and compute-intensiveoperations in machine learning. Consequently, there has been significant workon efficiently approximating matrix multiplies. We introduce a learning-basedalgorithm for this task that greatly outperforms exist … | Continue reading

@arxiv.org | 2 years ago

Quantum computing 40 years later

Forty years ago, Richard Feynman proposed harnessing quantum physics to builda more powerful kind of computer. Realizing Feynman's vision is one of thegrand challenges facing 21st century science and technology. In this article,we'll recall Feynman's contribution that launched th … | Continue reading

@arxiv.org | 2 years ago

Regularization Is All You Need: Simple Neural Nets Can Excel on Tabular Data

Tabular datasets are the last "unconquered castle" for deep learning, withtraditional ML methods like Gradient-Boosted Decision Trees still performingstrongly even against recent specialized neural architectures. In this paper,we hypothesize that the key to boosting the performan … | Continue reading

@arxiv.org | 2 years ago

GAN Prior Embedded Network for Blind Face Restoration in the Wild

Blind face restoration (BFR) from severely degraded face images in the wildis a very challenging problem. Due to the high illness of the problem and thecomplex unknown degradation, directly training a deep neural network (DNN)usually cannot lead to acceptable results. Existing ge … | Continue reading

@arxiv.org | 2 years ago

Empirical Study into Absence of Consent to Third-Party Tracking in Android Apps

Third-party tracking allows companies to collect users' behavioural data andtrack their activity across digital devices. This can put deep insights intousers' private lives into the hands of strangers, and often happens withoutusers' awareness or explicit consent. EU and UK data … | Continue reading

@arxiv.org | 2 years ago

Efficient and Robust Lidar-Based End-to-End Navigation

Deep learning has been used to demonstrate end-to-end neural network learningfor autonomous vehicle control from raw sensory input. While LiDAR sensorsprovide reliably accurate information, existing end-to-end driving solutionsare mainly based on cameras since processing 3D data … | Continue reading

@arxiv.org | 2 years ago

The halting problem is decidable on a set of asymptotic probability one

@arxiv.org | 2 years ago

Study of the Behaviors, Practices, and Motivations of TikTok Social Activists

Social media platforms such as Facebook and Twitter are used for socialactivism purposes, and TikTok is no different. We conducted 9 qualitativesemi-structured interviews with social activists who recently posted theirvideos on TikTok to understand. This study presents an initial … | Continue reading

@arxiv.org | 2 years ago

Estimating CCTV density (cameras per km) in the world using Street View data

The use of video surveillance in public spaces -- both by government agenciesand by private citizens -- has attracted considerable attention in recentyears, particularly in light of rapid... | Continue reading

@arxiv.org | 2 years ago

Learning Domain-Specific Word Embeddings from Sparse Cybersecurity Texts

Word embedding is a Natural Language Processing (NLP) technique thatautomatically maps words from a vocabulary to vectors of real numbers in anembedding space. It has been widely used in recent... | Continue reading

@arxiv.org | 2 years ago

Thinking Like Transformers

What is the computational model behind a Transformer? Where recurrent neuralnetworks have direct parallels in finite state machines, allowing cleardiscussion and thought around architecture... | Continue reading

@arxiv.org | 2 years ago

SimSwap: An Efficient Framework for High Fidelity Face Swapping

We propose an efficient framework, called Simple Swap (SimSwap), aiming forgeneralized and high fidelity face swapping. In contrast to previous approachesthat either lack the ability to... | Continue reading

@arxiv.org | 2 years ago

Scaling Vision Transformers

Attention-based neural networks such as the Vision Transformer (ViT) haverecently attained state-of-the-art results on many computer vision benchmarks.Scale is a primary ingredient in attaining... | Continue reading

@arxiv.org | 2 years ago

Two notes on notation (Donald E. Knuth, 1992)

@arxiv.org | 2 years ago

Fast and efficient training of large language models

The advent of the transformer has sparked a quick growth in the size oflanguage models, far outpacing hardware improvements. (Dense) transformers areexpected to reach the trillion-parameter... | Continue reading

@arxiv.org | 2 years ago

Basins with Tentacles

To explore basin geometry in high-dimensional dynamical systems, we considera ring of identical Kuramoto oscillators. Many attractors are known to coexistin this system; each is a twisted... | Continue reading

@arxiv.org | 2 years ago

Markpainting: Adversarial Machine Learning Meets Inpainting

Inpainting is a learned interpolation technique that is based on generativemodeling and used to populate masked or missing pieces in an image; it has wideapplications in picture editing and... | Continue reading

@arxiv.org | 2 years ago

Page 20