Data and charts related to the recent french bioinformatics job market.
This project aims to:
The data come from the Société Française de Bioinformatique (SFBI), an association who, among other activities, gathers job
offers and posts them on their website and mail list.
You will find here information related to more than 2500 job offers that have been posted from april 2012 onward.
The data will be updated regularly (every 4-5 months).
This project concerns data of french origin, and was essentially destined for the french bioinformatics community.
English has been used for the code, but the output charts are in french.
Please read the details section before using the charts.
We highly recommend to use the conda environment manager to install and use this
project. Not only does it provide a clean environment to work in, it also makes it really easy to install all the
necessary packages.
The following procedure assumes you have already installed conda.
If not, here is the miniconda download page.
Make use of the provided environment definition file env.yml
.
wget https://raw.githubusercontent.com/royludo/SFBIStats/master/env.yml
conda env create -f env.yml
This will setup a complete environment called sfbistatsenv with the core package requirements already installed.
Alternatively, you can use env_full.yml
. It contains the packages required by the code in the examples
directory
as well. If you decide to use env.yml
, refer to the READMEs in each example’s directory for the requirements that you
will have to install yourself.
In both cases, once the environment is created, don’t forget source activate sfbistatsenv
.
Clone the repository directly in your environment.
git clone https://github.com/royludo/SFBIStats.git
You will end up with a folder sfbistatsenv/SFBIStats
containing all the project.
Go in the project’s directory.
python setup.py install
You probably want to use the data and create some charts. The examples folders contains scripts that make use of the
SFBI jobs data to produce charts as seen here or here.
Each folder is different, and has its own dependencies. Please refer to the README provided in each folder for
instructions on how to install and run each example. If you used env_full.yml
you can directly run them.
Among the job offers posted on the SFBI mail list, only the formatted ones (posted through the SFBI website) have been
considered, for practical reasons. The SFBI started formatting job offers through their website only in 2012. Before
that, offers were sent and forwarded to the list as is. That is why the dataset starts in april 2012.
But through 2012 and partially 2013, users could keep on sending unformatted offers to the mail list. Those offers don’t
appear here. Thus, great care must be taken regarding the interpretation of those data. The increase of job offers must
be put into perspective, as people were switching from the previous anarchic job posts to the formatted one.
Due to different technical issues (crazy encodings, dead links…), a small number of offers do not appear in the
dataset. But as no real bias is introduced by these issues, this should be ok.
Each job entry contains the following fields:
Stage and Thèse don’t have any subtypes, so the contract_subtype field is empty.
The file jobs_anon.json contains all the data used in this project. It comes from mongodb, so there are some points to
be aware of. More information on the specific strict mode JSON format.
The data can be easily parsed nonetheless with:
import json
from bson import json_util
json.loads(jobs_anon.json, object_hook=json_util.object_hook)
See the json_util doc as pointed out on this stackoverflow thread.
The data have been scraped from web pages, and are delivered raw. Sanitization of the fields is left to users. But feel
free to reuse the functions in sfbistats/utils/utils.py
for that.
You can see a sample of the output charts here.
They are named according to the script that created it.
summary_5 and 10, time_series_8 and 9, summary_lins_6 and 7:
The education level required for a job has been inferred only from job subtypes, and concerns only CDD and CDI.
Stage and Thèse categories have been excluded.
The job was considered as requiring :
The fuzzy subtypes ‘CDD autre’ and ‘CDI autre’ have been excluded.
So beware that the information displayed in these charts may not be the most accurate there is. Use with caution.
lexical_analysis 1-11:
Generated with the word_cloud module using the titles of the job offers.
Types and subtypes of contracts are specified in each image title.
If you want to transform the charts with your own awesome style, if you have a better way to get the data (or more
data), or if you feel like some different kinds of charts could be useful, then don’t hesitate! Fork, code, and tell us
about it. We will happily accept any kind of contribution to this project!
19/01/2018
27/07/2017
10/02/2017
07/10/2016
Big thanks to the bioinfo-fr community, who lighted the sparkle of motivation for this
project, and nigiord in particular.
Credits for the original data go to the SFBI.
#bioinfo-fr on freenode. Nick is fragmeister.