Big Data Engineer

Job Description

About Foursquare:
Since our inception in 2009, Foursquare has been a leading force in changing how location information enriches our real-world and digital lives. As a location intelligence company, Foursquare is comprised of two well-known consumer apps, Foursquare and Swarm, as well as thriving media and enterprise products. Our B2B offerings include Places (for developers), Pinpoint and Attribution (for marketers), and Place Insights (for analysts, based on the world's largest foot traffic panel). With more than 200 people across our offices in New York, San Francisco, and in sales offices around the globe, we’re dedicated to our trailblazing mission—enriching consumer experiences and informing business decisions with location intelligence.
About the Infrastructure Team:
As a member of Foursquare's Infrastructure team, you will use your strong background in distributed systems to help build the core online and offline platforms that drive the services for our Enterprise and Consumer facing products. We're passionate about tackling tough infrastructure challenges (especially scaling problems), and look for others who like to dive deep into the code to help solve hard problems. You should be comfortable running with your own ideas and eager to learn new skills on a bleeding edge platform. We use a variety of tools, technologies, and languages to build software (e.g., Scala, Hadoop, Python, Thrift, MongoDB, Memcached, Redis, Kafka, Chef, Aurora, Mesos, RocksDB, Luigi, Pants), but experience with equivalent ones will do just fine.
Join us to help build and maintain the rock-solid core infrastructure upon on which we build our features. Here are some high level areas you could get involved in:
- Rebuilding our proxy tier to support more advanced load-balancing algorithms
- Improving the speed with which we can reliably continuously deploy our backend services
- Building a cost-effective and seamless way to run pipelines across our on-premises Hadoop cluster and Amazon EMR.
- Using Kafka and RocksDB to build a highly available, low latency, low footprint key-value store that can be deployed alongside business logic services in our Aurora cluster.
- Building tools to analyze and optimize cpu, core, memory and disk utilization of services that run on our Aurora and Hadoop clusters.


    • 3+ years of proven industry experience working on the backend services or infrastructure for a large scale, highly distributed web site or web service.
    • Solid foundation in computer science fundamentals with sound knowledge of data structures, algorithms, and design.
    • Strong Java or other object-oriented programming experience or, even better, experience and/or interest in functional languages (we use Scala!).
    • Familiarity with JVM profiling and GC tuning. Experience with tools like YourKit, JMH, statsd-jvm-profiler or equivalents a plus.
    • Experience designing and deploying large scale distributed systems, either serving online traffic or for offline computation. Experience in concurrency, multithreading and synchronization.
    • Bonus points for experience with Hadoop, MongoDB, Finagle, Kafka, ZooKeeper, Graphite (or other time series metrics stores), JVM profiling, Grafana, Linux system administration, Chef (or equivalent experience with Puppet, Ansible, etc.), Aurora (or other cluster management frameworks like Marathon or Kubernetes).
    • Comfortable in a small and fast-paced startup environment.