Stanford Graduate School of Business
Stanford's Graduate School of Business (GSB) has built a global reputation through immersive and innovative management programs. We provide students with a transformative leadership experience, pushing the boundaries of knowledge with faculty research and offering a portfolio of entrepreneurial and non-degree programs that deliver global impact like no other. We are committed to advancing diversity, equity, and inclusion in service of our mission of developing innovative, principled, and insightful leaders who change lives, change organizations, and change the world. We invite you to be part of this mission.
The GSB is currently looking for either an Associate or a Senior Associate to join our team. Based on the applicant’s education and experience, Human Resources will review and verify qualifications to determine the appropriate level.
As a (Senior) Research Computing and Analytics Associate on the Data, Analytics, and Research Computing (DARC) team, you will draw on deep technical knowledge and interpersonal skills to facilitate and accelerate academic research at the Stanford University Graduate School of Business (GSB). This will entail collaborating with a diverse group of researchers, faculty, students, and staff to assist in the selection and development of innovative technical solutions to resolve research questions. With the solutions that you develop, you will also deliver learning opportunities and training resources to GSB researchers.
You will join a team of research analytics scientists, data engineers, and research computing specialists to support research at the GSB. This role focuses on the data engineering side of the team, and you will be responsible for the development and maintenance of data pipelines for transferring, manipulating, and ingesting multiple terabyte-scale datasets to support GSB faculty research and data needs. Your clients will include GSB faculty and collaborators from various academic backgrounds, research interests, methodological specialties, and technical backgrounds. You will liaise between academic researchers and data providers, effectively communicating with database administrators, security professionals, research scientists, graduate students, and software engineers. In this data engineer role, you will bring the ability to understand the ecosystem of the research data lifecycle and partner with researchers to gather, store, and process these data effectively.
This role has general fluency with data and the ability to evaluate new data sets for correctness and completeness using programmatic tools. In this role, the engineer will regularly perform ETL tasks such as reshaping, merging, transferring, ingesting, and debugging data sets. This role will encompass data types in various structures and formats, and the associate is able to use their perseverance and problem-solving skills to handle unforeseen challenges. Your role will require you to understand the faculty member’s data pipeline goals and solve problems they face in:
- Transferring large datasets, both structured and unstructured, to Stanford’s on-premise and cloud platforms using a variety of data transfer protocols and tools.
- Ensuring the quality, validity, and completeness of data transferred.
- Understanding data security best practices.
- Working with contractual and vendor requirements for data transfer, storage, and use.
In addition to supporting individual researchers, you will be expected to contribute significantly to other long-range projects to improve the data engineering ecosystem, including:
- Creating reproducible data pipelines and documentation of research data lifecycle management.
- Identifying and testing new tools and platforms for data transfer and storage.
- Recommending systems that best align with a researcher's use case and computing needs.
- Developing example workflows for various featured datasets that researchers can leverage.
We are a team of data enthusiasts with an insatiable curiosity for data, research, and technology. We bring dedication to our work and a commitment to growth and balance in our team. Expect to work with bright minds on challenging problems, continually evaluate emerging tools, and apply new techniques in data engineering. As these technologies and research questions evolve, so will your role.
Your primary responsibilities include:
- Participate in collaborations with faculty and researchers to understand research goals and to help identify technical obstacles and solutions.
- Collaborate with outside vendors to ensure the research data is transferred in a timely and secure manner.
- Work with staff members and researchers to understand the data types collected and how to integrate research workflows across various databases and data warehouses.
- Research and suggest new toolsets/methods to improve data ingestion, storage, processing, and access.
- Contribute to developing guidelines, standards, and processes to ensure the security of systems and data appropriate to risk.
- Monitor ETL processes, system audits, and performance and proactively fix issues as found. Support the performance of systems.
- Serve as technical interface with software vendors and data providers, ensuring that researchers have effective access to data in selected systems.
- Contribute to formulating technical strategies, designing and engineering them to achieve research goals, using external vendors as needed.
- Participate in research and development efforts to enhance the team's capabilities in research support, including creating tutorials, testing new data collection methods, and staying abreast of relevant literature.
- Develop and deliver documentation and training for faculty and research staff.
MINIMUM REQUIREMENTS
This role can either be classified as an Associate or a Senior Associate level based on the applicant’s education and experience. Human Resources will review and verify the stated qualifications of the selected applicant to determine the appropriate level.
- Bachelor’s degree and a minimum of five years of relevant experience in computer science, or engineering or a combination of education and relevant experience.
- Knowledge of key data transfer, data warehousing, data storage technologies, and ETL techniques pertinent to large structured and unstructured datasets and research data workflows.
- Ability to support custom-built ETL, auditing, and scheduling programs (i.e. programmatically process data, transform and load into a data warehouse).
- Experience with research software written in one or more of R, Stata, Matlab, SAS, Julia, and JavaScript.
- Experience with data processing at scale and a basic understanding of which tools are appropriate at which times.
- Knowledge of key database technologies (relational databases and SQL).
- Ability to analyze research systems and data pipelines and propose solutions that leverage existing and emerging technologies and to collaboratively research, evaluate, architect, and deploy new tools, frameworks, and patterns to build scalable data ETL pipelines.
- Excellent problem-solving capabilities and creativity in developing bespoke solutions.
- Strong interpersonal and communication skills for effective collaboration with a diverse group of stakeholders.
- Interface with customers from across Stanford GSB and GSB Library to understand their requirements and deliver the data solutions they need.
- Continuous learner committed to understanding the latest technologies and research methodologies, including research computing.
- Service-oriented and empathetic, comfortable helping researchers of varying skill levels.
- Possess at least 1 year of relevant experience in a research or academic setting, demonstrating a track record of applying data engineering techniques to handle various data.
Additional Desired Skills for the Associate role:
- Bachelor’s degree in computer science, data science, statistics, or a related field. Master’s degree preferred.
- Strong programming skills in Python.
- Ability to identify data application performance bottlenecks and tune performance (i.e., demonstrated experience in data modeling, advanced SQL, writing, debugging, and performance tuning).
- Experience facilitating large data transfers and ETL projects, including Globus, rclone, and SFTP servers, and/or experience in building and maintaining data warehouses.
- Familiarity with all aspects of research data security, including deidentification, network and file system ACLs, IRB compliance, etc.
- Experience executing workloads on interactive Linux systems and cloud-based platforms (AWS and/or BigQuery) and/or experience with high-performance/systems languages and techniques.
- Demonstrated commitment to continual learning and professional growth.
- Ability to work independently and manage time effectively in a fast-paced environment with unpredictable workflows.
In addition to the desired skills mentioned above, the Senior Associate role:
- Bachelor’s degree and a minimum seven years of relevant experience in computer science, or engineering or a combination of education and relevant experience. Master's degree in computer science, data science, statistics, or a related field preferred.
- High level of proficiency in data science and machine learning methodologies and tools.
- Experience with research software written in one or more of R, Stata, Matlab, SAS, Julia, JavaScript.
- Experience with data processing at scale and an understanding of which tools are appropriate at which times.
- Familiarity with text and image processing techniques (OCR, NLP, regex, image classification).
- Ability to develop and deploy full-stack applications and API integrations for a research setting.
- Excellent problem-solving capabilities and creativity in developing bespoke solutions.
- Strong interpersonal and communication skills for effective collaboration with a diverse group of stakeholders.
- Continuous learner committed to understanding the latest technologies and research methodologies, including research computing.
- Service-oriented and empathetic, comfortable helping researchers of varying skill levels.
- Possess at least 3 years of relevant experience in a research or academic setting, demonstrating a track record of applying data science and machine learning techniques to solve complex problems.