New class of hardware-Level fault-tolerant Quantum-Computing Devices

We present measurements and simulations of semiconductor-superconductorheterostructure devices that are consistent with the observation of topologicalsuperconductivity and Majorana zero modes. The devices are fabricated fromhigh-mobility two-dimensional electron gases in which qu … | Continue reading


@arxiv.org | 1 year ago

Pile of Law: Learning Responsible Data Filtering from the Law

One concern with the rise of large language models lies with their potentialfor significant harm, particularly from pretraining on biased, obscene,copyrighted, and private information. Emerging ethical approaches haveattempted to filter pretraining material, but such approaches h … | Continue reading


@arxiv.org | 1 year ago

SATAn: Air-Gap Exfiltration Attack via Radio Signals from SATA Cables

This paper introduces a new type of attack on isolated, air-gappedworkstations. Although air-gap computers have no wireless connectivity, we showthat attackers can use the SATA cable as a wireless antenna to transfer radiosignals at the 6 GHz frequency band. The Serial ATA (SATA) … | Continue reading


@arxiv.org | 1 year ago

Towards Grand Unification of Object Tracking

We present a unified method, termed Unicorn, that can simultaneously solvefour tracking problems (SOT, MOT, VOS, MOTS) with a single network using thesame model parameters. Due to the fragmented definitions of the object trackingproblem itself, most existing trackers are develope … | Continue reading


@arxiv.org | 1 year ago

Characterization of JWST science performance from commissioning

This document characterizes the actual science performance of the James WebbSpace Telescope (JWST), as known on 12 July 2022. Following six months ofcommissioning to prepare JWST for science operations, the observatory is nowfully capable of achieving the discoveries for which it … | Continue reading


@arxiv.org | 1 year ago

Real-Time LSM-Trees for HTAP Workloads

Real-time analytics systems employ hybrid data layouts in which data arestored in different formats throughout their lifecycle. Recent data are storedin a row-oriented format to serve OLTP workloads and support high insert rates,while older data are transformed to a column-orient … | Continue reading


@arxiv.org | 1 year ago

XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

We present XMem, a video object segmentation architecture for long videoswith unified feature memory stores inspired by the Atkinson-Shiffrin memorymodel. Prior work on video object segmentation typically only uses one type offeature memory. For videos longer than a minute, a sin … | Continue reading


@arxiv.org | 1 year ago

YOLOv7: Trainable Bag-of-Freebies

YOLOv7 surpasses all known object detectors in both speed and accuracy in therange from 5 FPS to 160 FPS and has the highest accuracy 56.8% AP among allknown real-time object detectors with 30 FPS or higher on GPU V100. YOLOv7-E6object detector (56 FPS V100, 55.9% AP) outperforms … | Continue reading


@arxiv.org | 1 year ago

Neural Networks and the Chomsky Hierarchy by DeepMind

Reliable generalization lies at the heart of safe ML and AI. However,understanding when and how neural networks generalize remains one of the mostimportant unsolved problems in the field. In this work, we conduct an extensiveempirical study (2200 models, 16 tasks) to investigate … | Continue reading


@arxiv.org | 1 year ago

A Generalization of Otsu's Method and Minimum Error Thresholding (2020)

We present Generalized Histogram Thresholding (GHT), a simple, fast, andeffective technique for histogram-based image thresholding. GHT works byperforming approximate maximum a posteriori estimation of a mixture ofGaussians with appropriate priors. We demonstrate that GHT subsume … | Continue reading


@arxiv.org | 1 year ago

Do Ideas Have Shape?

We show that ResNets converge, in the infinite depth limit, to ageneralization of image registration algorithms. In this generalization, imagesare replaced by abstractions (ideas) living in high dimensional RKHS spaces,and material points are replaced by data points. Whereas comp … | Continue reading


@arxiv.org | 1 year ago

Re2G: Retrieve, Rerank, Generate

As demonstrated by GPT-3 and T5, transformers grow in capability as parameterspaces become larger and larger. However, for tasks that require a large amountof knowledge, non-parametric memory allows models to grow dramatically with asub-linear increase in computational cost and G … | Continue reading


@arxiv.org | 1 year ago

Functional or imperative? On pleasant semantics for differentiable programming

In machine learning (ML), researchers and engineers seem to be at odds.System implementers would prefer models to be declarative, with detailed typeinformation and semantic restrictions that allow models to be optimised,rearranged and parallelised. Yet practitioners show an overw … | Continue reading


@arxiv.org | 1 year ago

Mathematical Proof Between Generations

A proof is one of the most important concepts of mathematics. However, thereis a striking difference between how a proof is defined in theory and how it isused in practice. This puts the unique status of mathematics as exact scienceinto peril. Now may be the time to reconcile the … | Continue reading


@arxiv.org | 1 year ago

How to create an artificial magnetosphere for Mars

If humanity is ever to consider substantial, long-term colonization of Mars,the resources needed are going to be extensive. For a long-term human presenceon Mars to be established, serious thought would need to be given toterraforming the planet. One major requirement for such te … | Continue reading


@arxiv.org | 1 year ago

Language Models (Mostly) Know What They Know

We study whether language models can evaluate the validity of their ownclaims and predict which questions they will be able to answer correctly. Wefirst show that larger models are well-calibrated on diverse multiple choiceand true/false questions when they are provided in the ri … | Continue reading


@arxiv.org | 1 year ago

The Web Is Your Oyster – Knowledge-Intensive NLP Against a Large Web Corpus

In order to address increasing demands of real-world applications, theresearch for knowledge-intensive NLP (KI-NLP) should advance by capturing thechallenges of a truly open-domain environment: web-scale knowledge, lack ofstructure, inconsistent quality and noise. To this end, we … | Continue reading


@arxiv.org | 1 year ago

On the Principles of Parsimony and SelfConsistency for Emergence of Intelligence

Ten years into the revival of deep networks and artificial intelligence, wepropose a theoretical framework that sheds light on understanding deep networkswithin a bigger picture of Intelligence in general. We introduce twofundamental principles, Parsimony and Self-consistency, th … | Continue reading


@arxiv.org | 1 year ago

TPU-KNN: K Nearest Neighbor Search at Peak Flop/S

This paper presents a novel nearest neighbor search algorithm achieving TPU(Google Tensor Processing Unit) peak performance, outperformingstate-of-the-art GPU algorithms with similar level of recall. The design of theproposed algorithm is motivated by an accurate accelerator perf … | Continue reading


@arxiv.org | 1 year ago

DR-STRaNGe: End-to-End System Design for DRAM-Based True RNGs

Random number generation is an important task in a wide variety of criticalapplications including cryptographic algorithms, scientific simulations, andindustrial testing tools. True Random Number Generators (TRNGs) produce trulyrandom data by sampling a physical entropy source th … | Continue reading


@arxiv.org | 1 year ago

Supervised Learning for Coverage-Directed Test Selection

Constrained random test generation is one of the most widely adopted methodsfor generating stimuli for simulation-based verification. Randomness leads totest diversity, but tests tend to repeatedly exercise the same design logic.Constraints are written (typically manually) to bia … | Continue reading


@arxiv.org | 1 year ago

Breaking the Warp Barrier: Hyper-Fast Solitons in Einstein-Maxwell-Plasma Theory

Solitons in space--time capable of transporting time-like observers atsuperluminal speeds have long been tied to violations of the weak, strong, anddominant energy conditions of general relativity. The negative-energy sourcesrequired for these solitons must be created through ene … | Continue reading


@arxiv.org | 1 year ago

No Token Left Behind: Explainability-Aided Image Classification and Generation

The application of zero-shot learning in computer vision has beenrevolutionized by the use of image-text matching models. The most notableexample, CLIP, has been widely used for both zero-shot classification andguiding generative models with a text prompt. However, the zero-shot … | Continue reading


@arxiv.org | 1 year ago

A Prolog assisted search for new simple Lie algebras

We describe some recent computer investigations with the `Constraint LogicPropagation over Finite Domains' -- CLP(FD) -- library in the Prologprogramming environment to search for new simple Lie algebras over the field$\GF(2)$ of $2$ elements. Motivated by a paper of Grishkov et. … | Continue reading


@arxiv.org | 1 year ago

Closing the B-Tree/LSM-Tree Write Amplification Gap on Modern Storage Hardware

This paper studies the design of B-tree that can take full advantage ofmodern storage hardware with built-in transparent compression. Recent yearshave witnessed significant interest in applying log-structured merge tree(LSM-tree) as an alternative to B-tree. The current consensus … | Continue reading


@arxiv.org | 1 year ago

Programming with union, intersection, and negation types

In this essay, I present the advantages and, I dare say, the beauty ofprogramming in a language with set-theoretic types, that is, types that includeunion, intersection, and negation type connectives. I show by several exampleshow set-theoretic types are necessary to type some co … | Continue reading


@arxiv.org | 1 year ago

General-relativistic thin-shell Dyson mega-spheres

Loosely inspired by the somewhat fanciful notion of detecting an arbitrarilyadvanced alien civilization, we consider a general-relativistic thin-shellDyson mega-sphere completely enclosing a central star-like object, and performa full general-relativistic analysis using the Israe … | Continue reading


@arxiv.org | 1 year ago

SoK: Blockchain Governance

Blockchain systems come with a promise of decentralization that oftenstumbles on a roadblock when key decisions about modifying the softwarecodebase need to be made. This is attested by the fact that both of the twomajor cryptocurrencies, Bitcoin and Ethereum, have undergone hard … | Continue reading


@arxiv.org | 1 year ago

Viability of quantum communication across interstellar distances

The possibility of achieving quantum communication using photons acrossinterstellar distances is examined. For this, different factors are consideredthat could induce decoherence of photons, including the gravitational field ofastrophysical bodies, the particle content in the int … | Continue reading


@arxiv.org | 1 year ago

How Much More Data Do I Need? Estimating Requirements for Downstream Tasks

Given a small training data set and a learning algorithm, how much more datais necessary to reach a target validation or test performance? This question isof critical importance in applications such as autonomous driving or medicalimaging where collecting data is expensive and ti … | Continue reading


@arxiv.org | 1 year ago

Location reference recognition from texts: A survey and comparison

A vast amount of location information exists in unstructured texts, such associal media posts, news stories, scientific articles, web pages, travel blogs,and historical archives. Geoparsing refers to the process of recognizinglocation references from texts and identifying their g … | Continue reading


@arxiv.org | 1 year ago

How to Train Bert with an Academic Budget

While large language models a la BERT are used ubiquitously in NLP,pretraining them is considered a luxury that only a few well-funded industrylabs can afford. How can one train such models with a more modest budget? Wepresent a recipe for pretraining a masked language model in 2 … | Continue reading


@arxiv.org | 1 year ago

Software Mitigation of RISC-V Spectre Attacks

Speculative attacks are still an active threat today that, even if initiallyfocused on the x86 platform, reach across all modern hardware architectures.RISC-V is a newly proposed open instruction set architecture that has seentraction from both the industry and academia in recent … | Continue reading


@arxiv.org | 1 year ago

GitHub Copilot AI Pair Programmer: Asset or Liability?

Automatic program synthesis is a long-lasting dream in software engineering.Recently, a promising Deep Learning (DL) based solution, called Copilot, hasbeen proposed by Open AI and Microsoft as an industrial product. Although somestudies evaluate the correctness of Copilot soluti … | Continue reading


@arxiv.org | 1 year ago

A computer assisted proof for 100k years stability of the solar system

We present an analytical proof assisted by computer calculations for thedynamical stability of the eight main planets and Pluto for the next 100,000years. It means that the semi-major axes of the planets will not changesignificantly during this period. Also the eccentricities and … | Continue reading


@arxiv.org | 1 year ago

Classical Simulation of Quantum Supremacy Circuits

It is believed that random quantum circuits are difficult to simulateclassically. These have been used to demonstrate quantum supremacy: theexecution of a computational task on a quantum computer that is infeasible forany classical computer. The task underlying the assertion of q … | Continue reading


@arxiv.org | 1 year ago

Beyond neural scaling laws: beating power law scaling via data pruning

Widely observed neural scaling laws, in which error falls off as a power ofthe training set size, model size, or both, have driven substantial performanceimprovements in deep learning. However, these improvements through scalingalone require considerable costs in compute and ener … | Continue reading


@arxiv.org | 1 year ago

A Survey on Machine Learning Techniques for Source Code Analysis

Context: The advancements in machine learning techniques have encouragedresearchers to apply these techniques to a myriad of software engineering tasksthat use source code analysis such as testing and vulnerabilities detection. Alarge number of studies poses challenges to the com … | Continue reading


@arxiv.org | 1 year ago

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

In this paper we propose to study generalization of neural networks on smallalgorithmically generated datasets. In this setting, questions about dataefficiency, memorization, generalization, and speed of learning can be studiedin great detail. In some situations we show that neur … | Continue reading


@arxiv.org | 1 year ago

Human-Following and -guiding in Crowded Environments by Mobile Service Robots

Assistance robots have gained widespread attention in various industries suchas logistics and human assistance. The tasks of guiding or following a human ina crowded environment such as airports or train stations to carry weight orgoods is still an open problem. In these use case … | Continue reading


@arxiv.org | 1 year ago

DeepMind: Mastering the Game of Stratego with Model-Free Multiagent RL

We introduce DeepNash, an autonomous agent capable of learning to play theimperfect information game Stratego from scratch, up to a human expert level.Stratego is one of the few iconic board games that Artificial Intelligence (AI)has not yet mastered. This popular game has an eno … | Continue reading


@arxiv.org | 1 year ago

DeepMind: Mastering the Game of Stratego with Model-Free Multiagent RL

Continue reading


@arxiv.org | 1 year ago

A fast in-place interpreter for WebAssembly

WebAssembly (Wasm) is a compact, well-specified bytecode format that offers aportable compilation target with near-native execution speed. The bytecodeformat was specifically designed to be fast to parse, validate, and compile,positioning itself as a portable alternative to nativ … | Continue reading


@arxiv.org | 1 year ago

Simple and Effective Multi-Sentence TTS with Expressive and Coherent Prosody

Generating expressive and contextually appropriate prosody remains achallenge for modern text-to-speech (TTS) systems. This is particularly evidentfor long, multi-sentence inputs. In this paper, we examine simple extensions toa Transformer-based FastSpeech-like system, with the g … | Continue reading


@arxiv.org | 1 year ago

The Mathematics of Burger Flipping

What is the most effective way to grill food? Timing is everything, sinceonly one surface is exposed to heat at a given time. Should we flip only once,or many times? We present a simple model of cooking by flipping, and someinteresting observations emerge. The rate of cooking dep … | Continue reading


@arxiv.org | 1 year ago

Pen and Paper Exercises in Machine Learning

This is a collection of (mostly) pen-and-paper exercises in machine learning.The exercises are on the following topics: linear algebra, optimisation,directed graphical models, undirected graphical models, expressive power ofgraphical models, factor graphs and message passing, inf … | Continue reading


@arxiv.org | 1 year ago

Beats: An Open-Source, High-Precision, Multi-Channel EEG Acquisition Tool System

Stable and accurate electroencephalogram (EEG) signal acquisition isfundamental in non-invasive brain-computer interface (BCI) technology. Commonlyused EEG acquisition system's hardware and software are usually closed-source.Its inability to flexible expansion and secondary devel … | Continue reading


@arxiv.org | 1 year ago

Identifying Winners in Ranked Choice Voting Elections with Outstanding Ballots

Several election districts in the US have recently moved to ranked-choicevoting (RCV) to decide the results of local elections. RCV allows voters torank their choices, and the results are computed in rounds, eliminating onecandidate at a time. RCV ensures fairer elections and has … | Continue reading


@arxiv.org | 1 year ago