Location: 100% Remote (Candidates do not need to be in the USA)
As with all positions they must satisfy the Universal Job Requirements we lay out for all positions.
For information on where to submit applications, and what to include please see our submission guidelines.
We are in search of an expert in semantic ontologies and reasoners, particularly skilled in standards-based rule languages and adept at handling large datasets. The ideal candidate should understand the intricacies of applying complex rules to extensive ontological datasets and be familiar with the challenges such datasets present, including techniques like partial materialization, Rete algorithms, Euler paths, etc.
- Senior-level experience, evaluated primarily based on demonstrated skills rather than years of experience—equivalent to approximately 10 years of real-world expertise.
- Proficiency in Linux and Linux-based tools for day-to-day development.
- A willingness to rapidly learn new technologies in a fast-paced environment, often needing to quickly figure out new tech. Adaptability and ability to learn quickly and independently will be the most important skill.
- Previous experience as a Data Scientist specializing in NLP in a remote environment. This may include contributions to open-source projects (commercial experience is not mandatory).
- Must be comfortable working with many different languages fluidly without overreliance on a single language to address all problems.
- Exceptionally strong skills in NLP, machine learning, deep learning, and related tools.
- Proficient in writing efficient algorithms, understands big-O and little-O notation and how to apply it.
- Strong mathematical skills, as the role often involves translating math-heavy white papers into code, requiring a deep understanding of mathematical concepts.
- Must be comfortable with and proficient in using GIT.
- Proficiency in Java, Python, and other relevant programming languages.
- Experience with software development methodologies such as Agile or Scrum.
- Strong understanding of data structures and algorithms.
- Experience with database technologies, both SQL and NoSQL.
- Strong problem-solving skills and ability to think algorithmically.
- Excellent communication skills and the ability to work well in a team.
- Familiarity with common NLP libraries and techniques including Word2vec, parts of speech, stemming algorithms, etc.
- Experience with Kroki/PlantUML/Mermaid diagraming tools as well as any other text-based diagraming tools.
- Significant time working on open-source projects and has an open-source portfolio of work to share
The NLP Data Scientist will be primarily responsible for developing, implementing, and optimizing NLP models for the CleverThis platform. A deep understanding of NLP, machine learning, and deep learning is required. They will be directly responsible for evaluating the systems architecture, programming elements of the system assigned to them, mentoring more junior members of the team, and help translate NLP concepts into manageable development tasks for developers that lack an NLP background.
Responsibilities shall include (but not be limited to):
- Developing and improving NLP models and algorithms.
- Writing, editing, and improving NLP white papers.
- Assisting other team members in understanding and implementing advanced topics in NLP, machine learning, and deep learning.
- Writing code in multiple programming languages and, if necessary, learning new languages if required.
- Participating in the planning and architecture of the core platform and other ancillary enterprise systems.
- Participating in AGILE development workflow including but not limited to: daily status meetings, sprint planning, and sprint review.
- Working on Linux for all development tasks.
- Participating in peer reviews of source code.
- Designing and implementing NLP models and algorithms.
- Learning new technologies and techniques as required to accomplish tasks assigned.
- Integrating with, designing for, and writing queries against various types of databases including semantic (ie SPARQL), SQL (ie PostgreSQL), and NoSQL (ie MongoDB).