Finding Types of Experience from Adzuna Job Ads

We're going to use linguistic features to extract the types of experience commonly required for jobs from job ads. I'm not exactly sure what I mean by "types of experience"; we're going to let the data decide that!

We'll end up with a list of skills, and some relationships between skills that occur together

Import libraries and data

In [1]:
import re
import pandas as pd
import spacy
from spacy.util import filter_spans
from spacy.tokens import Span
from spacy.matcher import Matcher
In [2]:
spacy.__version__
Out[2]:
'2.2.3'
In [3]:
from spacy import displacy
from IPython.display import HTML, display

Get the data from Adzunda Job Salary Prediction Kaggle Competition, put it in the data subfolder and unzip all the files.

You can do this manually, or use the Kaggle API (once you've installed the API, downloaded your kaggle.json file and agreed to the competition rules)

In [4]:
# for split, ext in [('Test', 'zip'), ('Train', 'zip'), ('Valid', 'csv')]:
#     !kaggle competitions download -c job-salary-prediction --path data/ -f {split}_rev1.{ext}
    
# !find data/ -name '*.zip' -execdir unzip '{}' ';'
# !find data/ -name '*.zip' -exec rm '{}' ';'

# !ls data/

Read in all the data to a single dataframe

In [5]:
%%time
dfs = []
for split in ['Train', 'Valid', 'Test']:
    dfs.append(pd.read_csv(f'data/{split}_rev1.csv').assign(split=split))
df = pd.concat(dfs, sort=False, ignore_index=True)
del dfs
CPU times: user 6.55 s, sys: 2.8 s, total: 9.34 s
Wall time: 16.1 s

Train/Valid/Test is in the ratio 6:1:3, with about 40k ads in total

In [6]:
df.split.value_counts()
Out[6]:
Train    244768
Test     122463
Valid     40663
Name: split, dtype: int64

We're mainly interested in the ad content where the skills will be; that's the FullDescription

In [7]:
df
Out[7]:
Id Title FullDescription LocationRaw LocationNormalized ContractType ContractTime Company Category SalaryRaw SalaryNormalized SourceName split
0 12612628 Engineering Systems Analyst Engineering Systems Analyst Dorking Surrey Sal... Dorking, Surrey, Surrey Dorking NaN permanent Gregory Martin International Engineering Jobs 20000 - 30000/annum 20-30K 25000.0 cv-library.co.uk Train
1 12612830 Stress Engineer Glasgow Stress Engineer Glasgow Salary **** to **** We... Glasgow, Scotland, Scotland Glasgow NaN permanent Gregory Martin International Engineering Jobs 25000 - 35000/annum 25-35K 30000.0 cv-library.co.uk Train
2 12612844 Modelling and simulation analyst Mathematical Modeller / Simulation Analyst / O... Hampshire, South East, South East Hampshire NaN permanent Gregory Martin International Engineering Jobs 20000 - 40000/annum 20-40K 30000.0 cv-library.co.uk Train
3 12613049 Engineering Systems Analyst / Mathematical Mod... Engineering Systems Analyst / Mathematical Mod... Surrey, South East, South East Surrey NaN permanent Gregory Martin International Engineering Jobs 25000 - 30000/annum 25K-30K negotiable 27500.0 cv-library.co.uk Train
4 12613647 Pioneer, Miser Engineering Systems Analyst Pioneer, Miser Engineering Systems Analyst Do... Surrey, South East, South East Surrey NaN permanent Gregory Martin International Engineering Jobs 20000 - 30000/annum 20-30K 25000.0 cv-library.co.uk Train
... ... ... ... ... ... ... ... ... ... ... ... ... ...
407889 72703426 Foreign Exchange Consultant Worcestershire Do you have foreign exchange cashier experienc... Worcestershire Worcestershire full_time permanent Travel Trade Recruitment Travel Jobs NaN NaN jobs.travelweekly.co.uk Test
407890 72703453 Senior Business Travel Consultant Senior Business Travel Consultant Birmingham ... Birmingham Birmingham full_time permanent AA Appointments Travel Jobs NaN NaN jobs.travelweekly.co.uk Test
407891 72705210 TEACHER OF MATHS Position: Qualified Teacher Subject/Specialism... Swindon Swindon NaN contract NaN Teaching Jobs NaN NaN hays.co.uk Test
407892 72705214 Welsh Speaking Teaching Assistant Job Hays Education currently have a job for a Wels... Cardiff Cardiff NaN contract NaN Teaching Jobs NaN NaN hays.co.uk Test
407893 72705218 KS2 Teacher Are you a School KS2 Teacher looking for temp ... Camberley Camberley NaN contract NaN Teaching Jobs NaN NaN hays.co.uk Test

407894 rows × 13 columns

Extract the ads into a list

In [8]:
ads = list(df.FullDescription)

Initialise Spacy model

In [9]:
nlp = spacy.load('en_core_web_lg')

Extracting from job ads

Let's look at sentences in job ads containing the word 'experience'.

Experience is a common word, but used in a few different ways:

  • has experience with a tool/using a skill/in a system
  • providing an experience to customers
  • this job will give you experience

We're interested in the first kind which occurs in a few different ways:

  • {type of} experience ...
  • experience in {field}

Let's look at extracting them

In [10]:
def highlight_terms(terms, texts):
    for doc in nlp.pipe(texts):
        for sentence in set([tok.sent for tok in doc if tok.lower_ in terms]):
            text = sentence.text.strip()
            markup = re.sub(fr'(?i)\b({"|".join(terms)})\b', r'<strong>\1</strong>', text)
            display(HTML(markup))

Note that you can already see some problems with the way the text was cleansed; it looks like list structure is gone and hyphens have been removed (35 years experience is probably 3-5 years experience).

In [11]:
highlight_terms(['experience'], ads[:10])
Aerostructures experience You will also be expected to demonstrate the following qualities: A strong desire to progress quickly to a position of leadership Professional approach Strong communication skills, written and verbal Commercial awareness Team working, being comfortable working in international teams and self managing
The Requirements You will need to have a good engineering degree that includes structural analysis (such as aeronautical, mechanical, automotive, civil) with some experience in a professional engineering environment relevant to (but not limited to)
You will need to demonstrate experience in at least one or more of the following areas:
The roles are ideally suited to high calibre engineering graduates with any level of appropriate experience, so that we can give you the opportunity to use your technical skills to provide high quality input to our aerospace projects, spanning both aerostructures and aeroengines.
Thorough knowledge of Excel and proven ability to utilise this to create powerful decision support models Experience in Modelling and Simulation Techniques, Experience of techniques such as Discrete Event Simulation and/or SD modelling Mathematical/scientific background
*K AAE pension contribution, private medical and dental The opportunity Our client is an independent consultancy firm which has an opportunity for a Data Analyst with 35 years experience.
In addition to formal qualifications and experience, the successful candidate will require excellent written and verbal communication skills, be energetic, enterprising and have a determination to succeed.
Any experience of Pioneer or Miser software would be an advantage.
This role will be working within a small team working on the modelling of water industry asset deterioration and asset failure consequences, including the uploading of these models onto industryleading optimal asset management software Strong maths, stats and IT skills needed, Any previous experience within the Water industry would be an advantage.
For this role, you must have a minimum of 10 years experience in subsea engineering, pipelines design or construction.
Naturally positive and looking for the opportunity to work for a national player who are committed to your success Recruitment experience or 2 years business to business sales experience required.
This is an exceptional opportunity to join a construction / technical agency that hasn t shrunk in the current market one bit Our client is seeking a nononsense and highly skilled Recruiter with at least a couple of years experience under their belt.
They will need someone who has at least 1015 years of subsea cable engineering experience with significant experience within offshore oil and gas industries.

Helper functions

Let's take a variety of informative examples to test extractions on

In [12]:
examples = [
    'They will need someone who has at least 1015 years of subsea cable engineering experience',
    'This position is ideally suited to high calibre engineering graduate with significant and appropriate post graduate experience.',
    'Aerospace industry experience would be advantageous covering aerostructures and/or aero engines.',
    'A sufficient and appropriate level of building services and controls experience gained within a client organisation, engineering consultancy or equipment supplier.',
    
    'Experience in Modelling and Simulation Techniques',
    'Any experience of Pioneer or Miser software would be an advantage.',
    'For this role, you must have a minimum of 10 years experience in subsea engineering, pipelines design or construction.',
    'Has experience within the quality department of a related company in a similar role Ideally from a mechanical or manufacturing engineering background.',
    'and have experience of the technical leadership of projects to time, quality and cost objectives.',
    'Experience of protection and control design at Transmission and Distribution voltages.',
    'Candidates with experience in telesales, callcentre, customer service, receptionist or travel are ideal for this role',
    'Experience dealing with business clients (B2B) would be preferable.',
    'Previous experience working as a Chef de Partie in a one AA Rosette hotel is needed for the position.',
    'The post holder must hold as a minimum Level 1 in Trampolining (British Gymnastics) and have experience in working with children, be fun, outgoing and have excellent customer service skills and be able to instruct in line with the British Gymnastics syllabus.',
    'Experience of techniques such as Discrete Event Simulation and/or SD modelling Mathematical/scientific background',

    
]

We could look to extract:

  • a series of nouns before the word experience (e.g. "subsea cable engineering experience"); or
  • experience as/in something (e.g "experience as a Chef de Partie")

we'll do this using Spacy's Rule Based Matcher

In [13]:
matcher = Matcher(nlp.vocab)
pattern = [{'POS': 'NOUN', 'OP': '+'}, {'LOWER': 'experience'}]
matcher.add('experience_noun', [pattern])

pattern = [{'LOWER': 'experience'}, {'POS': 'ADP'}, {'POS': {'IN': ('DET', 'NOUN', 'PROPN')}, 'OP': '+'}]
matcher.add('experience_adp', [pattern])
In [14]:
doc = nlp(examples[0])
matcher(doc)
Out[14]:
[(12285600890577657150, 13, 15), (12285600890577657150, 12, 15)]

Here we have a little helper function to visualise extractions.

In [15]:
def show_extraction(examples, *extractors):
    seen = set()
    for doc in nlp.pipe(examples):
        doc.ents = filter_spans([Span(doc, start, end, label) for extractor in extractors for label, start, end in extractor(doc)])
        for tok in doc:
            if tok.lower_ == 'experience':
                sentence = tok.sent
                if sentence.text in seen:
                    continue
                seen.update([sentence.text])
                if not sentence.ents:
                    doc.ents = list(doc.ents) + [Span(doc, tok.i, tok.i+1, 'MISSING')]
                displacy.render(sentence, style='ent', options = {'colors': {'MISSING': 'pink',
                                                                            'EXPERIENCE': 'lightgreen'}})
                

This is on the right track, but doesn't always pick up the appropriate context.

In [16]:
show_extraction(examples, matcher)
They will need someone who has at least 1015 years of subsea cable engineering experience experience_noun
This position is ideally suited to high calibre engineering graduate with significant and appropriate post graduate experience experience_noun .
Aerospace industry experience experience_noun would be advantageous covering aerostructures and/or aero engines.
A sufficient and appropriate level of building services and controls experience experience_noun gained within a client organisation, engineering consultancy or equipment supplier.
Experience in Modelling experience_adp and Simulation Techniques
Any experience of Pioneer experience_adp or Miser software would be an advantage.
For this role, you must have a minimum of 10 years experience experience_noun in subsea engineering, pipelines design or construction.
Has experience within the quality department experience_adp of a related company in a similar role Ideally from a mechanical or manufacturing engineering background.
and have experience of the experience_adp technical leadership of projects to time, quality and cost objectives.
Experience of protection experience_adp and control design at Transmission and Distribution voltages.
Candidates with experience in telesales experience_adp , callcentre, customer service, receptionist or travel are ideal for this role
Experience MISSING dealing with business clients (B2B) would be preferable.
Previous experience MISSING working as a Chef de Partie in a one AA Rosette hotel is needed for the position.
The post holder must hold as a minimum Level 1 in Trampolining (British Gymnastics) and have experience MISSING in working with children, be fun, outgoing and have excellent customer service skills and be able to instruct in line with the British Gymnastics syllabus.
Experience of techniques experience_adp such as Discrete Event Simulation and/or SD modelling Mathematical/scientific background

We can then extract them from a document.

Note the use of filter_spans; this ensures if we have overlapping spans we only take the largest one.

In [17]:
def get_extractions(examples, *extractors):
    # Could use context instead of enumerate
    for idx, doc in enumerate(nlp.pipe(examples, batch_size=100, disable=['ner'])):
        for ent in filter_spans([Span(doc, start, end, label) for extractor in extractors for label, start, end in extractor(doc)]):
            sent = ent.root.sent
            yield ent.text, idx, ent.start, ent.end, ent.label_, sent.start, sent.end
In [18]:
list(get_extractions(ads[:3], matcher))
Out[18]:
[('experience in a', 1, 150, 153, 'experience_adp', 122, 164),
 ('years experience', 2, 45, 47, 'experience_noun', 16, 48),
 ('decision support models Experience', 2, 92, 96, 'experience_noun', 79, 118),
 ('Experience of techniques', 2, 102, 105, 'experience_adp', 79, 118)]

Put it in a dataframe and join with the job metadata

In [19]:
def extract_df(*extractors, n_max=None, **kwargs):
    if n_max is None:
        n_max = len(df)
    ent_df = pd.DataFrame(list(get_extractions(df[:n_max].FullDescription, *extractors)),
                          columns=['text', 'docidx', 'start', 'end', 'label', 'sent_start', 'sent_end'])
    return ent_df.merge(df, how='left', left_on='docidx', right_index=True)
In [20]:
%time ent_df = extract_df(matcher, n_max=1000)
ent_df.head()
CPU times: user 16.2 s, sys: 11.7 s, total: 27.9 s
Wall time: 1min 18s
Out[20]:
text docidx start end label sent_start sent_end Id Title FullDescription LocationRaw LocationNormalized ContractType ContractTime Company Category SalaryRaw SalaryNormalized SourceName split
0 experience in a 1 150 153 experience_adp 122 164 12612830 Stress Engineer Glasgow Stress Engineer Glasgow Salary **** to **** We... Glasgow, Scotland, Scotland Glasgow NaN permanent Gregory Martin International Engineering Jobs 25000 - 35000/annum 25-35K 30000.0 cv-library.co.uk Train
1 years experience 2 45 47 experience_noun 16 48 12612844 Modelling and simulation analyst Mathematical Modeller / Simulation Analyst / O... Hampshire, South East, South East Hampshire NaN permanent Gregory Martin International Engineering Jobs 20000 - 40000/annum 20-40K 30000.0 cv-library.co.uk Train
2 decision support models Experience 2 92 96 experience_noun 79 118 12612844 Modelling and simulation analyst Mathematical Modeller / Simulation Analyst / O... Hampshire, South East, South East Hampshire NaN permanent Gregory Martin International Engineering Jobs 20000 - 40000/annum 20-40K 30000.0 cv-library.co.uk Train
3 Experience of techniques 2 102 105 experience_adp 79 118 12612844 Modelling and simulation analyst Mathematical Modeller / Simulation Analyst / O... Hampshire, South East, South East Hampshire NaN permanent Gregory Martin International Engineering Jobs 20000 - 40000/annum 20-40K 30000.0 cv-library.co.uk Train
4 experience within the Water industry 5 117 122 experience_adp 71 127 13179816 Engineering Systems Analyst Water Industry Engineering Systems Analyst Water Industry Loc... Dorking, Surrey, Surrey, Surrey Dorking NaN permanent Gregory Martin International Engineering Jobs 20000 - 30000/annum 20K to 30K 25000.0 cv-library.co.uk Train

Aggregate the counts of different texts.

It's more significant if it happens accross multiple Advertisers/Sources.

In [21]:
def aggregate_df(df, col=['text']):
    return (df
            .groupby(col)
            .agg(n_company=('Company', 'nunique'),
                 n_ad=('Id', 'nunique'),
                 n_source=('SourceName', 'nunique'),
                 n=('Id', 'count'))
            .reset_index()
            .sort_values(['n_company', 'n_ad', 'n'], ascending=False)
        )

Unfortunately what is caught with these simple rules has mixed results

In [22]:
aggregate_df(ent_df).head(10)
Out[22]:
text n_company n_ad n_source n
119 experience in a 4 52 5 52
286 years experience 3 22 3 22
233 management experience 2 17 3 18
69 banqueting experience 2 2 1 2
87 design experience 2 2 1 2
88 development experience 2 2 1 2
196 experience within a 1 7 2 7
260 rosette experience 1 5 2 5
176 experience of the 1 4 2 4
142 experience in software development 1 3 1 3

Let's add some tooling to look at specific cases

In [23]:
def showent(docidx, start, end, label, sent_start, sent_end, **kwargs):
    # We don't need to parse it, so just make_doc
    doc = nlp.make_doc(ads[docidx])
    doc.ents = [Span(doc, start, end, label)]
    sent = doc[sent_start:sent_end]
    displacy.render(sent, style='ent')
    
def showent_df(df):
    for idx, row in df.iterrows():
        showent(**row)

We can see that we've actually missed the subject entirely!

We could be a bit more clever and use some structure from the grammar to extract what we need.

In [24]:
showent_df(ent_df.query('text == "experience in a"').head())
The Requirements You will need to have a good engineering degree that includes structural analysis (such as aeronautical, mechanical, automotive, civil) with some experience in a experience_adp professional engineering environment relevant to (but not limited to)
All Chef de Parties applying for this role must have a strong background with highlights previous AA Rosette experience in a experience_adp high volume operation.
CE hands on experience in a experience_adp nuclear plant environment.
Do you want to have experience in a experience_adp
The ideal Junior Sous will need to have at least two years experience in a experience_adp similar operation you will need to have high standards of presentation and passion for high quality fresh gastro pub food.

Extracting types of experience

Let's extract some examples of {type of} experience

Here's a rough rule to extract the phrase to the left of the word 'experience' using SpaCy's noun_chunks, which is based on the syntactic structure (see spacy.lang.en.syntax_iterators)

In [25]:
def extract_noun_phrase_experience(doc):
    for np in doc.noun_chunks:
        if np[-1].lower_ == 'experience':
            if len(np) > 1:
                yield 'EXPERIENCE', np[0].i, np[-1].i

Notice how our rule picks out the right amount of context like "subsea cable engineering".

However we're also picking up quantifiers like "Any" and "10 years"

In [26]:
show_extraction(examples, extract_noun_phrase_experience)
They will need someone who has at least 1015 years of subsea cable engineering EXPERIENCE experience
This position is ideally suited to high calibre engineering graduate with significant and appropriate post graduate EXPERIENCE experience.
Aerospace industry EXPERIENCE experience would be advantageous covering aerostructures and/or aero engines.
A sufficient and appropriate level of building services and controls EXPERIENCE experience gained within a client organisation, engineering consultancy or equipment supplier.
Experience MISSING in Modelling and Simulation Techniques
Any EXPERIENCE experience of Pioneer or Miser software would be an advantage.
For this role, you must have a minimum of 10 years EXPERIENCE experience in subsea engineering, pipelines design or construction.
Has experience MISSING within the quality department of a related company in a similar role Ideally from a mechanical or manufacturing engineering background.
and have experience MISSING of the technical leadership of projects to time, quality and cost objectives.
Experience MISSING of protection and control design at Transmission and Distribution voltages.
Candidates with experience MISSING in telesales, callcentre, customer service, receptionist or travel are ideal for this role
Experience MISSING dealing with business clients (B2B) would be preferable.
Previous EXPERIENCE experience working as a Chef de Partie in a one AA Rosette hotel is needed for the position.
The post holder must hold as a minimum Level 1 in Trampolining (British Gymnastics) and have experience MISSING in working with children, be fun, outgoing and have excellent customer service skills and be able to instruct in line with the British Gymnastics syllabus.
Experience MISSING of techniques such as Discrete Event Simulation and/or SD modelling Mathematical/scientific background

Let's look at how this does across a larger sample of job ads:

  • There are sentence/word boundary errors that cause the rule to break (e.g. powerful decision support models Experience)
  • We pick up quantifiers (previous, some, appropriate), as well as time quantifiers (3-5 years, 10 years)
In [27]:
show_extraction(ads[:10], extract_noun_phrase_experience)
The roles are ideally suited to high calibre engineering graduates with any level of appropriate EXPERIENCE experience, so that we can give you the opportunity to use your technical skills to provide high quality input to our aerospace projects, spanning both aerostructures and aeroengines.
The Requirements You will need to have a good engineering degree that includes structural analysis (such as aeronautical, mechanical, automotive, civil) with some EXPERIENCE experience in a professional engineering environment relevant to (but not limited to)
You will need to demonstrate experience MISSING in at least one or more of the following areas:
Aerostructures EXPERIENCE experience You will also be expected to demonstrate the following qualities: A strong desire to progress quickly to a position of leadership Professional approach Strong communication skills, written and verbal Commercial awareness Team working, being comfortable working in international teams and self managing
*K AAE pension contribution, private medical and dental The opportunity Our client is an independent consultancy firm which has an opportunity for a Data Analyst with 35 years EXPERIENCE experience.
Thorough knowledge of Excel and proven ability to utilise this to create powerful decision support models EXPERIENCE Experience in Modelling and Simulation Techniques, Experience of techniques such as Discrete Event Simulation and/or SD modelling Mathematical/scientific background
In addition to formal qualifications and experience MISSING , the successful candidate will require excellent written and verbal communication skills, be energetic, enterprising and have a determination to succeed.
This role will be working within a small team working on the modelling of water industry asset deterioration and asset failure consequences, including the uploading of these models onto industryleading optimal asset management software Strong maths, stats and IT skills needed, Any previous EXPERIENCE experience within the Water industry would be an advantage.
Any EXPERIENCE experience of Pioneer or Miser software would be an advantage.
For this role, you must have a minimum of 10 years EXPERIENCE experience in subsea engineering, pipelines design or construction.
Naturally positive and looking for the opportunity to work for a national player who are committed to your success Recruitment experience or 2 years business to business sales EXPERIENCE experience required.
This is an exceptional opportunity to join a construction / technical agency that hasn t shrunk in the current market one bit Our client is seeking a nononsense and highly skilled Recruiter with at least a couple of years experience MISSING under their belt.
They will need someone who has at least 1015 years of subsea cable engineering EXPERIENCE experience with significant EXPERIENCE experience within offshore oil and gas industries.
In [28]:
%time ent_df = extract_df(extract_noun_phrase_experience, n_max=50000)
CPU times: user 14min 4s, sys: 6min 43s, total: 20min 48s
Wall time: 25min 59s

Again we are frequently picking up quantifiers

In [29]:
aggregate_df(ent_df).head(50)
Out[29]:
text n_company n_ad n_source n
13558 previous 894 2040 100 2128
5745 Previous 883 1745 100 1829
6321 Proven 260 468 73 515
15315 some 229 355 65 367
17077 your 221 864 70 874
11492 extensive 220 381 68 389
14616 relevant 208 398 62 405
14298 proven 189 301 63 305
10820 demonstrable 142 217 54 222
16178 the 138 303 56 313
14971 significant 133 207 53 209
15642 strong 125 237 53 240
1810 Any 124 226 40 234
3525 Extensive 123 203 48 225
14849 sales 117 201 35 204
932 2 years 117 195 49 198
11195 equivalent 113 179 41 190
16576 the relevant 105 261 43 263
2946 Demonstrable 104 169 53 186
7462 Strong 102 165 46 169
16438 the following 99 164 47 165
10460 commercial 96 167 33 171
14130 prior 95 163 42 167
15202 solid 93 156 46 161
7320 Some 91 133 37 137
14449 recent 88 213 47 220
11895 good 88 127 43 128
1089 3 years 86 144 44 152
13472 practical 77 99 36 107
7142 Significant 75 136 45 147
17219 73 148 26 204
6215 Prior 72 110 37 114
1270 5 years 71 101 38 109
5192 No 70 157 24 166
9746 at least 2 years 68 138 36 138
9542 any 65 111 32 112
3329 Essential 61 110 34 111
12609 management 61 104 36 106
16736 this 59 116 41 118
6655 Relevant 58 92 43 95
2994 Desirable 57 102 35 102
16021 supervisory 56 105 31 106
107 * 56 97 34 117
264 ** 56 88 37 96
16666 their 56 86 43 86
7007 Sales 54 98 23 98
7265 Solid 54 77 32 82
6726 Required 53 99 29 99
12320 industry 52 82 29 83
15938 substantial 52 68 32 71
In [30]:
showent_df(ent_df.query("text=='Previous'").head(5))
Previous EXPERIENCE experience is always helpful but not essential, as long as you enjoy looking after people and are able to work evenings and weekends, we would like to hear from you.
Previous EXPERIENCE experience is always helpful but not essential, as long as you enjoy looking after people and are able to work evenings and weekends, we would like to hear from you.
Previous EXPERIENCE experience working as a Chef de Partie in a one AA Rosette hotel is needed for the position.
Previous EXPERIENCE experience in major infrastructural projects coupled with the ability to present tenders to the Directors with the technical and commercial skills required for a senior role.
Previous EXPERIENCE experience of working in a business managing large customers and internal stakeholders.

This looks like a bad parse (probably because of the stripped whitespace)

In [31]:
showent_df(ent_df.query("text=='Skills'").head(5))
Skills EXPERIENCE Experience
Skills EXPERIENCE Experience:
The successful candidate will be expected to assist other staff in the Finance Operations team as and when required Skills EXPERIENCE Experience: Good PC skills including MS Excel and Word Strong communication skills including telephone Good organisational skills
Skills EXPERIENCE Experience required: Experience of London market insurance underwriting.
Skills EXPERIENCE Experience

We can blacklist the most common qualifiers

In [32]:
experience_qualifiers = ['previous', 'prior', 'following', 'recent', 'the above', 'past',
                         
                         'proven', 'demonstrable', 'demonstrated', 'relevant', 'significant', 'practical',
                         'essential', 'equivalent', 'desirable', 'required', 'considerable', 'similar',
                         'working', 'specific', 'qualified', 'direct', 'hands on', 'handson', 
                         
                         'strong', 'solid', 'good', 'substantial', 'excellent', 'the right', 'valuable', 'invaluable',
                         
                         'some', 'any', 'none', 'much', 'extensive', 'no', 'more',
                         'your', 'their',
                         'years', 'months',
                         'uk',
                        ]

stopwords = ['a', 'an', '*', '**', '•', 'this', 'the', ':', 'Skills']

experience_qualifier_pattern = rf'\b(?:{"|".join(experience_qualifiers)})\b'

experience_qualifier_pattern
Out[32]:
'\\b(?:previous|prior|following|recent|the above|past|proven|demonstrable|demonstrated|relevant|significant|practical|essential|equivalent|desirable|required|considerable|similar|working|specific|qualified|direct|hands on|handson|strong|solid|good|substantial|excellent|the right|valuable|invaluable|some|any|none|much|extensive|no|more|your|their|years|months|uk)\\b'

If we ignore stopwords we start getting some skills out:

  • sales
  • commercial
  • managment
  • supervisory
  • customer service
  • development
  • supervisory
  • technical
  • managment
  • telesales
  • financial services
  • design
  • project managment
  • retail
  • business sales
  • SQL
  • marketing
  • people management
  • SAP
  • engineering
In [33]:
aggregate_df(ent_df[(~ent_df.text.str.lower().str.contains(experience_qualifier_pattern)) & # Not a qualifier
                     ~ent_df.text.isin(stopwords)]).head(50)
Out[33]:
text n_company n_ad n_source n
8436 sales 117 201 35 204
6108 commercial 96 167 33 171
7445 management 61 104 36 106
8733 supervisory 56 105 31 106
4013 Sales 54 98 23 98
7187 industry 52 82 29 83
1311 Commercial 43 68 20 71
9040 the customer 43 65 27 66
6359 customer service 42 58 23 69
9463 work 39 61 29 62
6471 development 28 43 19 44
6346 customer 27 41 22 43
4158 Skills/ 26 44 23 45
9353 user 26 39 20 44
2399 Ideally 25 30 17 30
4276 Supervisory 24 34 22 34
8857 telesales 22 38 17 38
67 ) 22 32 18 32
8809 technical 22 31 16 31
2859 Management 21 48 26 48
1967 Financial Services 21 31 18 31
6439 design 21 29 19 29
8547 skills 21 29 18 29
8325 registration 20 120 13 120
3616 Project management 20 30 16 30
8386 retail 19 36 21 36
3603 Project Management 19 25 14 25
5954 business sales 19 24 12 24
2649 Knowledge/ 18 26 12 30
3931 SQL 18 22 14 24
3667 Qualifications/ 17 35 13 35
7496 marketing 17 30 18 30
2461 Industry 17 27 13 27
6973 graduate 17 26 12 29
8166 professional 17 26 14 26
6732 experience 17 23 13 23
1577 Design 17 19 8 19
9138 the necessary 16 54 18 54
7172 indepth 16 28 18 30
4379 Technical 16 20 14 21
1854 Experience 16 18 9 18
7965 people management 15 25 16 25
4723 Work 15 21 15 21
3858 SAP 15 17 11 17
6652 engineering 15 16 11 16
7461 managerial 14 29 18 29
4150 Skills / 14 24 14 24
5663 appropriate 14 24 13 24
5431 agency 14 19 14 20
5531 an advantage 14 18 16 18

Commercial is more of a qualifier

In [34]:
showent_df(ent_df.query("text=='Commercial'").head(5))
Experience Required In order of importance:Substantial ManManagement, departmental, budgetary and Project Management and Commercial EXPERIENCE experience in a large multi country business or operation.
Commercial EXPERIENCE experience of SAS or SQL.
Commercial EXPERIENCE experience of administrating and supporting of Linux and VMWare.
Commercial EXPERIENCE experience in similar role, working with Autodesk drawing package (AutoCAD 2010 or newer)
Commercial EXPERIENCE experience with .Net

Management experience seems correct

In [35]:
showent_df(ent_df.query("text=='Management'").head(5))
You will need to be RMN qualified with Management EXPERIENCE experience within a hospital setting.
To ensure that CQC Standards for Quality & Safety are met at all times You will need to be RMN qualified with Management EXPERIENCE experience within a hospital setting ideally in a charge nurse role
To ensure that CQC Standards for Quality & Safety are met at all times You will need to be RMN qualified with Management EXPERIENCE experience within a hospital setting.
The successful candidate will have proven Management EXPERIENCE experience and hold a 1st level nursing qualification with an active NMC PIN.
You will ideally be RGN or RMN qualified with Management EXPERIENCE experience.
In [36]:
showent_df(ent_df.query("text=='Financial Services'").head(5))
Financial Services EXPERIENCE experience
Financial Services EXPERIENCE Experience
Financial Services EXPERIENCE experience would be beneficial.
Financial Services EXPERIENCE experience
Ability to work to targets Experience of successfully working in a targeted contact centre environment (with Financial Services EXPERIENCE experience) is essential.
In [37]:
showent_df(ent_df.query("text=='SQL'").head(5))
Further refining the physical design to meet system storage requirements Education/Experience Qualifications BS or MS degree or equivalent experience in a technical field Skills: 3 years SQL EXPERIENCE experience 3 years Oracle PL/SQL experience
SQL EXPERIENCE experience using one or more of SQL Server, Oracle or DB
SAS and or SQL EXPERIENCE experience is a key requirement as the role has a heavy analytical bias.
**k Global service provider is looking for a experienced Microsoft Dynamics Solutions Architect with extensive CRM, C, .Net and SQL EXPERIENCE experience.
Further refining the physical design to meet system storage requirements Education/Experience Qualifications BS or MS degree or equivalent experience in a technical field Skills: 3 years SQL EXPERIENCE experience 3 years Oracle PL/SQL experience
In [38]:
showent_df(ent_df.query("text=='development'").head(5))
**) Have development EXPERIENCE experience in C and JavaScript
We are offering an exciting opportunity for an experienced applications programmer with a number of years of .Net development EXPERIENCE experience.
Desirable experience would include creating UML diagrams, development EXPERIENCE experience with file systems / database storage, experience with clustered computer environments, or experience with distributed computation platforms.
Permanent The successful candidate will have a minimum of 5 years of development EXPERIENCE experience If you are looking for a new .NET Development challenge and are looking for a market leading Software Company with excellent career progression opportunities and expanding your knowledge and experience, send in your CV now and call Nick Bray on *
It is also desirable that you have had development EXPERIENCE experience in a variety of platforms (Windows, Linux, Android and/or other)

Extracting experience in a field

Another way experience is commonly stated is with an adposition

experience in/with modelling

For example

In [39]:
doc = nlp('Experience of protection and control design at Transmission and Distribution voltages.')
displacy.render([doc], style='dep', jupyter=True)
Experience NOUN of ADP protection NOUN and CCONJ control NOUN design NOUN at ADP Transmission PROPN and CCONJ Distribution NOUN voltages. NOUN prep nmod cc conj pobj prep nmod cc conj pobj

We extract the experience by looking to the right for a preposition (e.g. in/with) and then looking for its object and extracting the whole left subtree.

This is obviously quite specific to English.

In [40]:
def extract_adp_experience(doc, label='EXPERIENCE'):
    for tok in doc:
        if tok.lower_ == 'experience':
            for child in tok.rights:
                if child.dep_ == 'prep':
                    for obj in child.children:
                        if obj.dep_ == 'pobj':
                            yield label, obj.left_edge.i, obj.i+1

This works very well! All of our examples are specific.

Notice that:

  • We're missing conjugations: we get experience in subsea engineering, but miss "pipelines design" and "construction"
  • We miss elaborations: "such as Discrete Event Simulation ..."
  • We miss experience in actions (experience in working with children)
In [41]:
show_extraction(examples, extract_adp_experience)
They will need someone who has at least 1015 years of subsea cable engineering experience MISSING
This position is ideally suited to high calibre engineering graduate with significant and appropriate post graduate experience MISSING .
Aerospace industry experience MISSING would be advantageous covering aerostructures and/or aero engines.
A sufficient and appropriate level of building services and controls experience MISSING gained within a client organisation, engineering consultancy or equipment supplier.
Experience in Modelling and Simulation Techniques EXPERIENCE
Any experience of Pioneer or Miser software EXPERIENCE would be an advantage.
For this role, you must have a minimum of 10 years experience in subsea engineering EXPERIENCE , pipelines design or construction.
Has experience within the quality department EXPERIENCE of a related company in a similar role Ideally from a mechanical or manufacturing engineering background.
and have experience of the technical leadership EXPERIENCE of projects to time, quality and cost objectives.
Experience of protection and control design EXPERIENCE at Transmission and Distribution voltages.
Candidates with experience in telesales EXPERIENCE , callcentre, customer service, receptionist or travel are ideal for this role
Experience MISSING dealing with business clients (B2B) would be preferable.
Previous experience MISSING working as a Chef de Partie in a one AA Rosette hotel is needed for the position.
The post holder must hold as a minimum Level 1 in Trampolining (British Gymnastics) and have experience MISSING in working with children, be fun, outgoing and have excellent customer service skills and be able to instruct in line with the British Gymnastics syllabus.
Experience of techniques EXPERIENCE such as Discrete Event Simulation and/or SD modelling Mathematical/scientific background

An alternative strategy would be to look for a phrase like "Experience in/with/using" and then look for the noun phrase

Using spaCy's noun chunks we can do it backwards (I'm sure there's an easy way to do it forwards which could be quicker, but it's nice using spaCy's noun chunks directly):

In [42]:
def extract_adp_experience_2(doc):
    for np in doc.noun_chunks:
        start_tok = np[0].i
        if start_tok >= 2 and doc[start_tok - 2].lower_ == 'experience' and doc[start_tok - 1].pos_ == 'ADP':
            yield 'EXPERIENCE', start_tok, start_tok + len(np)
In [43]:
show_extraction(examples, extract_adp_experience_2)
They will need someone who has at least 1015 years of subsea cable engineering experience MISSING
This position is ideally suited to high calibre engineering graduate with significant and appropriate post graduate experience MISSING .
Aerospace industry experience MISSING would be advantageous covering aerostructures and/or aero engines.
A sufficient and appropriate level of building services and controls experience MISSING gained within a client organisation, engineering consultancy or equipment supplier.
Experience in Modelling and Simulation Techniques EXPERIENCE
Any experience of Pioneer or Miser software EXPERIENCE would be an advantage.
For this role, you must have a minimum of 10 years experience in subsea engineering EXPERIENCE , pipelines design or construction.
Has experience within the quality department EXPERIENCE of a related company in a similar role Ideally from a mechanical or manufacturing engineering background.
and have experience of the technical leadership EXPERIENCE of projects to time, quality and cost objectives.
Experience of protection and control design EXPERIENCE at Transmission and Distribution voltages.
Candidates with experience in telesales EXPERIENCE , callcentre, customer service, receptionist or travel are ideal for this role
Experience MISSING dealing with business clients (B2B) would be preferable.
Previous experience MISSING working as a Chef de Partie in a one AA Rosette hotel is needed for the position.
The post holder must hold as a minimum Level 1 in Trampolining (British Gymnastics) and have experience MISSING in working with children, be fun, outgoing and have excellent customer service skills and be able to instruct in line with the British Gymnastics syllabus.
Experience of techniques EXPERIENCE such as Discrete Event Simulation and/or SD modelling Mathematical/scientific background

Comparing speeds: the results are similar:

In [44]:
%time ent_adp_df = extract_df(extract_adp_experience, n_max=50)
CPU times: user 953 ms, sys: 266 ms, total: 1.22 s
Wall time: 1.28 s
In [45]:
%time ent_adp_df = extract_df(extract_adp_experience_2, n_max=50)
CPU times: user 984 ms, sys: 203 ms, total: 1.19 s
Wall time: 1.19 s

Extracting 50k results

In [46]:
%time ent_adp_df = extract_df(extract_adp_experience, n_max=50000)
CPU times: user 13min 29s, sys: 6min 32s, total: 20min 1s
Wall time: 25min
In [47]:
aggregate_df(ent_adp_df).head(50)
Out[47]:
text n_company n_ad n_source n
6342 a similar role 213 456 60 461
13767 the following 130 256 40 261
12074 sales 77 103 37 106
11041 one 55 80 30 85
13645 the design 53 74 31 83
14355 the use 49 71 28 72
8748 design 47 72 25 76
526 C 46 86 18 87
12167 selling 43 57 15 60
14498 this role 42 86 25 87
6851 all aspects 40 64 23 66
14461 this 39 57 23 58
9349 experience 38 55 29 55
13777 the following areas 37 63 25 65
11390 planning 37 46 31 46
12758 teaching 34 61 15 63
7469 any 34 54 25 54
8813 development 34 52 21 56
11645 project management 34 49 29 49
14475 this field 33 58 28 58
13921 the industry 33 51 23 51
5654 a manufacturing environment 31 46 18 46
3254 SQL Server 30 57 15 57
12359 software development 29 43 16 46
4031 a 28 50 17 50
12417 some 28 44 20 44
6294 a similar environment 28 42 19 42
3240 SQL 28 35 12 35
14465 this area 27 45 23 47
4 .NET 27 38 16 38
11805 recruitment 26 40 20 40
10171 knowledge 26 38 19 38
1867 Java 24 42 13 42
1529 HTML 24 40 14 40
6005 a range 24 36 26 37
13350 the UK 23 65 21 65
10532 managing projects 23 30 19 32
10486 management 22 47 20 47
4981 a commercial environment 22 32 19 32
7939 business 22 29 19 32
1691 IT 22 29 20 30
127 ASP.NET 22 23 10 23
14893 writing 21 29 15 30
13650 the development 21 28 20 32
9464 financial services 21 24 16 24
13402 the ability 20 37 21 37
3140 SAP 20 29 14 30
8029 care 19 78 14 79
14230 the role 19 46 16 46
13733 the field 19 29 19 29

The extraction works pretty well for "sales" (although the last example should be "sales interviewing skills")

In [48]:
showent_df(ent_adp_df.query("text=='sales'").head(5))
Key responsibility for clarifying service contract content to ensure maximum profit Ensure highest target margins are being achieved through negotiation focus price and cost effective contract conditions Close followup from start of negotiation until handover to the Service Division Representation of the Service Division in cross functional negotiation teams Develop project specific, winwin solutions in order to match customer s business case and our client s service goals Professional presentation of service contract quotations including proper cost benefit arguments Key Skills Service contract management and sales engineering experience in a capital goods industry is advantageous International experience of the target markets would be a distinct advantage Experience with sales EXPERIENCE to utilities is an asset Experience from
Have strong industry experience delivering mobile solutions in at least two industries Existing experience in sales EXPERIENCE and mobility, with an ability to harness and maintain relationships with clients.
The successful candidate will have the following: Experience in sales EXPERIENCE of digital technology solutions and systems Good understanding of convergence of ICT, digital communications technology and digital media and emerging trends Experience of the complete sales process from finding and qualifying sales opportunities, preparation of proposals, etc
Requirements: Fluency in English and German with Czech or Polish preferred Solid experience in sales EXPERIENCE or telesales is essential
You'll ideally bring proven sales experience, including sales EXPERIENCE interviewing skills and a track record in a target driven environment.

Selling works as well, but we lose the context of who they are selling to

In [49]:
showent_df(ent_adp_df.query("text=='selling'").head(5))
As part of the company s ambitious growth plans to double the current 5million turnover, these openings have been created for proven sales professionals who, ideally, possess experience of selling EXPERIENCE to either Facilities Management or Mechanical Engineering contractors.
As part of the company s ambitious growth plans to double the current 5million turnover, these openings have been created for proven sales professionals who, ideally, possess experience of selling EXPERIENCE to either Facilities Management or Mechanical Engineering contractors.
Experience of selling EXPERIENCE into the industrial sector is essential.
You MUST have a Scientific Degree / Qualification or Animal Health and Welfare Experience of selling EXPERIENCE into the veterinary / animal industry is preferred but not essential
The candidate: As a candidate for this role, you will have: an excellent track record of selling and sales management, experience of both inside and outside sales at a senior level, and experience of selling EXPERIENCE into corporate HR/L D functions
In [50]:
showent_df(ent_adp_df.query("text=='design'").head(5))
* Experience (5 years) of design EXPERIENCE & implementation of digital and analogue embedded hardware
* Experience in design EXPERIENCE , development or quality engineering
Enjoys working in team environment and has excellent communication and negotiation skills Experience in national multi unit retail environment, and process orientated Ability to work with people of various disciplines at all levels within organisation and strong and proven drive for results Proven experience in design EXPERIENCE , space planning, retail and equipment layouts Current driving licence,
Proven experience in design EXPERIENCE , space planning, retail and equipment layouts Current driving licence,
Candidates must have excellent creative and technical artworking skills, and proven experience of design EXPERIENCE for print and online (preferably with Flash, HTML5 or similar).
In [51]:
showent_df(ent_adp_df.query("text=='C'").head(5))
Experience of C EXPERIENCE and/or C++ Experience of LabVIEW and/or LabWindows Experience of embedded software systems Experience in developing solutions within common lifecycles such as: Waterfall; Spiral;
The Ideal Person: 1 years industrial development experience in C EXPERIENCE .Net
The Ideal Person: 12 years industrial development experience in C EXPERIENCE .Net
Min ****) in Computer Science, Software, Electrical or Electronic Engineering or other related discipline Minimum of 2 years development experience in C EXPERIENCE or C++, preferably on a UNIX platform (Solaris or Linux)
**) Have development experience in C EXPERIENCE and JavaScript

We're often getting "a" because of bad tokenization

In [52]:
showent_df(ent_adp_df.query("text=='a'").head(5))
Recent care experience within a EXPERIENCE Nursing Home or Care Home Environment
Due to the nature of service you will be required to learn all sections meaning you will become an all round chef with experience of all areas to a EXPERIENCE
You must have experience within a EXPERIENCE similar *
Significant experience as a EXPERIENCE
The ideal candidate will have previous experience within a EXPERIENCE
In [58]:
def highlight_text_context(terms, texts, n_before=1, n_after=2):
    context = []
    for doc in nlp.pipe(texts):
        sentences = list(doc.sents)
        idxs = [i for i, sent in enumerate(sentences) if any(term in sent.text.lower() for term in terms)]
        
        for idx in idxs:
            before = ''.join(sent.text for sent in sentences[max(idx-n_before, 0):idx])
            after = ''.join(sent.text for sent in sentences[idx+1:min(idx+n_before+1, len(sentences))])
            text = sentences[idx].text
            markup = re.sub(fr'(?i)\b({"|".join(terms)})\b', r'<strong>\1</strong>',
                                 f'<span style="color:blue">{text}</span>')
            display(HTML(before + markup + after))

The term "a" occurs mostly due to bad parsing because all numbers have been replaced with ****

In [59]:
terms = ['experience']

for _, q in ent_adp_df.query("text=='a'").head(7).iterrows():
    doc = nlp(q.FullDescription)
    if q.sent_start > 0:
        prev_sent = doc[q.sent_start - 1].sent.text
    else:
        prev_sent = ''
    
    if q.sent_end < len(doc):
        next_sent = doc[q.sent_end].sent.text
    else:
        next_sent = ''
        
    text = doc[q.sent_start:q.sent_end].text
    markup = re.sub(fr'(?i)\b({"|".join(terms)})\b', r'<strong>\1</strong>',
                     f'<span style="color:blue">{text}</span>')
    display(HTML(prev_sent + markup + next_sent))
The successful candidate will lead by example, be motivated and committed to helping others, and must be able to provide the following skills and experience: Recent care experience within a Nursing Home or Care Home Environment Have Excellent Communication Skills
Following the recent fashion of London restaurants this will be somewhere to enjoy relaxed fine dining in an environment that offers small and large plates that can be enjoyed sitting at the bar or in the main restaurant.Due to the nature of service you will be required to learn all sections meaning you will become an all round chef with experience of all areas to a****AA Rosette standard.
Our client is seeking someone who can manage an existing team with the assistance of the existing management team to ensure that customer s expectation and needs are meet from front of house to their room.You must have experience within a similar **
Create own brand designs inline with briefs.Significant experience as a****/
Working a variety of shifts you will be competent and confident in carrying out various duties for the hotel guests within food and beverage.The ideal candidate will have previous experience within a**** star property and have their own transport.
*** restaurants with a commitment to quality seasonal ingredients and showcasing the best local produce and artisan products from around the UK Pastry Sous Chef required with extensive experience in a**** star environment, experience of menu development, strong understanding and application of French, Asian, British and International cuisine, chocolate and bread making knowledge, management and leadership experience including ordering and costings Working closely with Head Pastry Chef and able to manage Pastry Kitchen in his absence.
Day to day working your own section within the kitchen along with additional responsibilities, you will be working with a team who wishes to progress and further achieve and take pride in all they do.You should ideally have experience in a**** rosette Kitchen and be passionate and focused about your career.

This is an interesting case where our heuristic extraction rule hasn't captured the complexity

In [60]:
displacy.render(nlp('Recent care experience within a Nursing Home or Care Home Environment'))
Recent ADJ care NOUN experience NOUN within ADP a DET Nursing PROPN Home PROPN or CCONJ Care PROPN Home PROPN Environment PROPN amod compound prep det compound nmod cc compound compound pobj

Expanding conjugations

It would be useful to get each form of experience in long lists:

In [61]:
doc = nlp("Candidates with experience in telesales, callcentre, customer service, receptionist or travel are ideal for this role.")
doc
Out[61]:
Candidates with experience in telesales, callcentre, customer service, receptionist or travel are ideal for this role.
In [62]:
displacy.render(doc)
Candidates NOUN with ADP experience NOUN in ADP telesales, PROPN callcentre, PROPN customer NOUN service, NOUN receptionist NOUN or CCONJ travel NOUN are AUX ideal ADJ for ADP this DET role. NOUN nsubj prep pobj prep pobj conj compound conj conj cc conj acomp prep det pobj
In [63]:
span = doc[4:5]
span
Out[63]:
telesales

This function is a very crude approximation of Spacy's noun_chunks, to get an approximate noun phrase

In [64]:
def get_left_span(tok, label='', include=True):
    offset = 1 if include else 0
    idx = tok.i
    while idx > tok.left_edge.i:
        if tok.doc[idx - 1].pos_ in ('NOUN', 'PROPN', 'ADJ', 'X'):
            idx -= 1
        else:
            break
    return label, idx, tok.i+offset
In [65]:
get_left_span(nlp('The Subsea pipeline engineering')[-1])
Out[65]:
('', 1, 4)
In [66]:
get_left_span(span.root)
Out[66]:
('', 4, 5)

This function gets the children of the conjugation

In [67]:
def get_conjugations(tok):
    new = [tok]
    while new:
        tok = new.pop()
        yield tok
        for child in tok.children:
            if child.dep_ == 'conj':
                new.append(child)
In [68]:
list(get_conjugations(span.root))
Out[68]:
[telesales, callcentre, service, receptionist, travel]

And we then expand them by getting the left span

In [69]:
[doc[start:end] for label, start, end in [get_left_span(tok) for tok in get_conjugations(span.root)]]
Out[69]:
[telesales, callcentre, customer service, receptionist, travel]

Note we could expand with other related terms like 'proficiency' or 'ability' or 'skill', but we won't for now (because they don't occur as much)

In [70]:
#old
EXP_TERMS = ['experience']
def extract_adp_conj_experience(doc, label='EXPERIENCE'):
    for tok in doc:
        if tok.lower_ in EXP_TERMS:
            for child in tok.rights:
                if child.dep_ == 'prep':
                    for obj in child.children:
                        if obj.dep_ == 'pobj':
                            for conj in get_conjugations(obj):
                                yield get_left_span(conj, label)

That's much better; we still lose elaboration (such as), but we're extracting much more from lists.

Notice that we're not getting Pioneer

In [71]:
show_extraction(examples, extract_adp_conj_experience)
They will need someone who has at least 1015 years of subsea cable engineering experience MISSING
This position is ideally suited to high calibre engineering graduate with significant and appropriate post graduate experience MISSING .
Aerospace industry experience MISSING would be advantageous covering aerostructures and/or aero engines.
A sufficient and appropriate level of building services and controls experience MISSING gained within a client organisation, engineering consultancy or equipment supplier.
Experience in Modelling and Simulation Techniques EXPERIENCE
Any experience of Pioneer or Miser software EXPERIENCE would be an advantage.
For this role, you must have a minimum of 10 years experience in subsea engineering EXPERIENCE , pipelines design or construction.
Has experience within the quality department EXPERIENCE of a related company in a similar role Ideally from a mechanical or manufacturing engineering background.
and have experience of the technical leadership EXPERIENCE of projects to time, quality and cost objectives.
Experience of protection and control design EXPERIENCE at Transmission and Distribution voltages.
Candidates with experience in telesales EXPERIENCE , callcentre EXPERIENCE , customer service EXPERIENCE , receptionist EXPERIENCE or travel EXPERIENCE are ideal for this role
Experience MISSING dealing with business clients (B2B) would be preferable.
Previous experience MISSING working as a Chef de Partie in a one AA Rosette hotel is needed for the position.
The post holder must hold as a minimum Level 1 in Trampolining (British Gymnastics) and have experience MISSING in working with children, be fun, outgoing and have excellent customer service skills and be able to instruct in line with the British Gymnastics syllabus.
Experience of techniques EXPERIENCE such as Discrete Event Simulation and/or SD modelling Mathematical/scientific background

The reason we don't get Pioneer is the sentence

Any experience of Pioneer or Miser software would be an advantage.

really means

Any experience of Pioneer software or Miser software would be an advantage.

but we don't have any way to reconstruct the missing word (yet)

In [72]:
doc = nlp('Any experience of Pioneer or Miser software would be an advantage.')

displacy.render(doc)
Any DET experience NOUN of ADP Pioneer PROPN or CCONJ Miser PROPN software NOUN would VERB be AUX an DET advantage. NOUN det nsubj prep nmod cc conj pobj aux det attr
In [73]:
show_extraction(['Any experience of Pioneer software or Miser software would be an advantage.'], extract_adp_conj_experience)
Any experience of Pioneer software EXPERIENCE or Miser software EXPERIENCE would be an advantage.
In [74]:
doc = nlp('Any experience of Pioneer software or Miser software would be an advantage.')

displacy.render(doc)
Any DET experience NOUN of ADP Pioneer PROPN software NOUN or CCONJ Miser PROPN software NOUN would VERB be AUX an DET advantage. NOUN det nsubj prep compound pobj cc compound conj aux det attr

Looking at a sample of ads it works alright

In [75]:
show_extraction(ads[:10], extract_adp_conj_experience)
The roles are ideally suited to high calibre engineering graduates with any level of appropriate experience MISSING , so that we can give you the opportunity to use your technical skills to provide high quality input to our aerospace projects, spanning both aerostructures and aeroengines.
The Requirements You will need to have a good engineering degree that includes structural analysis (such as aeronautical, mechanical, automotive, civil) with some experience in a professional engineering environment EXPERIENCE relevant to (but not limited to)
You will need to demonstrate experience MISSING in at least one or more of the following areas:
Aerostructures experience MISSING You will also be expected to demonstrate the following qualities: A strong desire to progress quickly to a position of leadership Professional approach Strong communication skills, written and verbal Commercial awareness Team working, being comfortable working in international teams and self managing
*K AAE pension contribution, private medical and dental The opportunity Our client is an independent consultancy firm which has an opportunity for a Data Analyst with 35 years experience MISSING .
Thorough knowledge of Excel and proven ability to utilise this to create powerful decision support models Experience in Modelling and Simulation Techniques EXPERIENCE , Experience of techniques EXPERIENCE such as Discrete Event Simulation and/or SD modelling Mathematical/scientific background
In addition to formal qualifications and experience MISSING , the successful candidate will require excellent written and verbal communication skills, be energetic, enterprising and have a determination to succeed.
This role will be working within a small team working on the modelling of water industry asset deterioration and asset failure consequences, including the uploading of these models onto industryleading optimal asset management software Strong maths, stats and IT skills needed, Any previous experience within the Water industry EXPERIENCE would be an advantage.
Any experience of Pioneer or Miser software EXPERIENCE would be an advantage.
For this role, you must have a minimum of 10 years experience in subsea engineering EXPERIENCE , pipelines design or construction.
Naturally positive and looking for the opportunity to work for a national player who are committed to your success Recruitment experience MISSING or 2 years business to business sales experience required.
This is an exceptional opportunity to join a construction / technical agency that hasn t shrunk in the current market one bit Our client is seeking a nononsense and highly skilled Recruiter with at least a couple of years experience under their belt EXPERIENCE .
They will need someone who has at least 1015 years of subsea cable engineering experience with significant experience within offshore oil and gas industries EXPERIENCE .

Extracting Verbs followed by Adposition

Notice something like 'Experience dealing with business clients' we have a verb followed by an adposition followed by the Noun. We can generate complex rules to parse things like this.

In [76]:
def extract_verb_maybeadj_noun_experience(doc, label='EXPERIENCE'):
    for tok in doc:
        if tok.lower_ in EXP_TERMS:
            for child in tok.rights:
                if child.dep_ == 'acl':
                    for gc in child.children:
                        if gc.dep_ == 'prep':
                            for ggc in gc.children:
                                if ggc.dep_ == 'pobj':
                                    for c in get_conjugations(ggc):
                                        yield get_left_span(c, 'EXPERIENCE')
                        elif gc.dep_ == 'dobj':
                            for c in get_conjugations(gc):
                                yield get_left_span(c, 'EXPERIENCE')

This works pretty well, when the parse works well

In [77]:
show_extraction(examples, extract_verb_maybeadj_noun_experience)
They will need someone who has at least 1015 years of subsea cable engineering experience MISSING
This position is ideally suited to high calibre engineering graduate with significant and appropriate post graduate experience MISSING .
Aerospace industry experience MISSING would be advantageous covering aerostructures and/or aero engines.
A sufficient and appropriate level of building services and controls experience gained within a client organisation EXPERIENCE , engineering consultancy EXPERIENCE or equipment supplier EXPERIENCE .
Experience MISSING in Modelling and Simulation Techniques
Any experience MISSING of Pioneer or Miser software would be an advantage.
For this role, you must have a minimum of 10 years experience MISSING in subsea engineering, pipelines design or construction.
Has experience MISSING within the quality department of a related company in a similar role Ideally from a mechanical or manufacturing engineering background.
and have experience MISSING of the technical leadership of projects to time, quality and cost objectives.
Experience MISSING of protection and control design at Transmission and Distribution voltages.
Candidates with experience MISSING in telesales, callcentre, customer service, receptionist or travel are ideal for this role
Experience dealing with business clients EXPERIENCE (B2B) would be preferable.
Previous experience working as a Chef de Partie EXPERIENCE in a one AA Rosette hotel EXPERIENCE is needed for the position.
The post holder must hold as a minimum Level 1 in Trampolining (British Gymnastics) and have experience MISSING in working with children, be fun, outgoing and have excellent customer service skills and be able to instruct in line with the British Gymnastics syllabus.
Experience MISSING of techniques such as Discrete Event Simulation and/or SD modelling Mathematical/scientific background

Extracting types of experience accross all job ads

Let's just focus on the cleanest rule

In [78]:
extract_exps = [extract_adp_conj_experience,]

Most of the false positives are due to bad dependency parsing (which is often due to bad tokenization/sentence splitting)

However it looks like we're extracting a lot of signal

This takes a while because we need to parse every job ad and then run the rules across them.

Since documents are independent we could easily distribute this across a cluster if we needed to (think Hadoop/Dask/GNU Parallel).

In [79]:
len(df)
Out[79]:
407894
In [80]:
n_ads = len(df)
In [81]:
%%time
df_ents = extract_df(*extract_exps, n_max=n_ads)
CPU times: user 1h 58min 21s, sys: 59min 34s, total: 2h 57min 55s
Wall time: 3h 43min 48s
In [82]:
df_ents.to_csv('experience_adp_ents.csv', index=False)
In [83]:
df_ents = pd.read_csv('experience_adp_ents.csv', low_memory=False)
In [84]:
df_ents
Out[84]:
text docidx start end label sent_start sent_end Id Title FullDescription LocationRaw LocationNormalized ContractType ContractTime Company Category SalaryRaw SalaryNormalized SourceName split
0 professional engineering environment 1 153 156 EXPERIENCE 122 164 12612830 Stress Engineer Glasgow Stress Engineer Glasgow Salary **** to **** We... Glasgow, Scotland, Scotland Glasgow NaN permanent Gregory Martin International Engineering Jobs 25000 - 35000/annum 25-35K 30000.0 cv-library.co.uk Train
1 Simulation Techniques 2 99 101 EXPERIENCE 79 118 12612844 Modelling and simulation analyst Mathematical Modeller / Simulation Analyst / O... Hampshire, South East, South East Hampshire NaN permanent Gregory Martin International Engineering Jobs 20000 - 40000/annum 20-40K 30000.0 cv-library.co.uk Train
2 techniques 2 104 105 EXPERIENCE 79 118 12612844 Modelling and simulation analyst Mathematical Modeller / Simulation Analyst / O... Hampshire, South East, South East Hampshire NaN permanent Gregory Martin International Engineering Jobs 20000 - 40000/annum 20-40K 30000.0 cv-library.co.uk Train
3 Water industry 5 120 122 EXPERIENCE 71 127 13179816 Engineering Systems Analyst Water Industry Engineering Systems Analyst Water Industry Loc... Dorking, Surrey, Surrey, Surrey Dorking NaN permanent Gregory Martin International Engineering Jobs 20000 - 30000/annum 20K to 30K 25000.0 cv-library.co.uk Train
4 Miser software 5 211 213 EXPERIENCE 206 218 13179816 Engineering Systems Analyst Water Industry Engineering Systems Analyst Water Industry Loc... Dorking, Surrey, Surrey, Surrey Dorking NaN permanent Gregory Martin International Engineering Jobs 20000 - 30000/annum 20K to 30K 25000.0 cv-library.co.uk Train
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
315902 insurance 407886 46 47 EXPERIENCE 40 62 72703412 Marine & International Trade Lawyer **** Marine and International Trade Assistant **** ... London London NaN permanent NaN Legal Jobs NaN NaN hays.co.uk Test
315903 marine contracting 407886 49 51 EXPERIENCE 40 62 72703412 Marine & International Trade Lawyer **** Marine and International Trade Assistant **** ... London London NaN permanent NaN Legal Jobs NaN NaN hays.co.uk Test
315904 construction 407886 52 53 EXPERIENCE 40 62 72703412 Marine & International Trade Lawyer **** Marine and International Trade Assistant **** ... London London NaN permanent NaN Legal Jobs NaN NaN hays.co.uk Test
315905 shipbuilding 407886 55 56 EXPERIENCE 40 62 72703412 Marine & International Trade Lawyer **** Marine and International Trade Assistant **** ... London London NaN permanent NaN Legal Jobs NaN NaN hays.co.uk Test
315906 industry qualifications 407890 172 174 EXPERIENCE 157 174 72703453 Senior Business Travel Consultant Senior Business Travel Consultant Birmingham ... Birmingham Birmingham full_time permanent AA Appointments Travel Jobs NaN NaN jobs.travelweekly.co.uk Test

315907 rows × 20 columns

Because we kept the context we can show where the label came from

In [85]:
showent_df(df_ents[:2])
The Requirements You will need to have a good engineering degree that includes structural analysis (such as aeronautical, mechanical, automotive, civil) with some experience in a professional engineering environment EXPERIENCE relevant to (but not limited to)
Thorough knowledge of Excel and proven ability to utilise this to create powerful decision support models Experience in Modelling and Simulation Techniques EXPERIENCE , Experience of techniques such as Discrete Event Simulation and/or SD modelling Mathematical/scientific background

Let's count the most common terms

In [86]:
df_ent_agg = aggregate_df(df_ents)
df_ent_agg.head(10)
Out[86]:
text n_company n_ad n_source n
56334 similar role 1447 3829 107 3848
54297 role 841 2529 103 2565
33715 design 702 1612 71 1730
33966 development 702 1590 91 1677
37952 following 684 2118 76 2160
36705 experience 634 1350 97 1369
44556 management 572 1159 94 1179
54523 sales 513 1176 83 1195
42695 knowledge 504 950 87 954
36155 environment 480 968 83 979
In [87]:
len(df_ent_agg)
Out[87]:
62589
In [88]:
from flashtext import KeywordProcessor
In [89]:
keyword_processor = KeywordProcessor(case_sensitive=True)
In [90]:
skills = df_ent_agg.query('n_company >= 3').text
len(skills)
Out[90]:
12757
In [91]:
for skill in skills:
    keyword_processor.add_keyword(skill)
In [92]:
from collections import Counter
In [93]:
%%time
counter = Counter()
ad_counter = Counter()
for ad in ads[:10_000]:
    keywords = keyword_processor.extract_keywords(ad)
    counter.update(keywords)
    ad_counter.update(set(keywords))
CPU times: user 13.8 s, sys: 93.8 ms, total: 13.9 s
Wall time: 14.9 s
In [94]:
df_count_ad = pd.DataFrame(ad_counter.items(), columns=['text', 'n_ad_occur'])
df_count = pd.DataFrame(counter.items(), columns=['text', 'n_occur'])
In [95]:
df_c = (
    df_ent_agg
    .merge(df_count, how='left', validate='1:1')
    .merge(df_count_ad, how='left', validate='1:1')
     .assign(pct_ad_occur = lambda df: df.n_ad_occur / n_ads,
        avg_occur = lambda df: df.n_occur / df.n_ad_occur,
        ad_freq = lambda df: df.n_ad_occur / df.n_ad)
)
In [96]:
df_c.to_csv('term_counts.csv', index=False)
In [97]:
df_c = pd.read_csv('term_counts.csv')
In [98]:
df_c.head(50)
Out[98]:
text n_company n_ad n_source n n_occur n_ad_occur pct_ad_occur avg_occur ad_freq
0 similar role 1447 3829 107 3848 257.0 250.0 0.000613 1.028000 0.065291
1 role 841 2529 103 2565 8811.0 5080.0 0.012454 1.734449 2.008699
2 design 702 1612 71 1730 1590.0 908.0 0.002226 1.751101 0.563275
3 development 702 1590 91 1677 2927.0 2077.0 0.005092 1.409244 1.306289
4 following 684 2118 76 2160 1232.0 1110.0 0.002721 1.109910 0.524079
5 experience 634 1350 97 1369 9882.0 6024.0 0.014769 1.640438 4.462222
6 management 572 1159 94 1179 2049.0 1516.0 0.003717 1.351583 1.308024
7 sales 513 1176 83 1195 2285.0 1135.0 0.002783 2.013216 0.965136
8 knowledge 504 950 87 954 2130.0 1652.0 0.004050 1.289346 1.738947
9 environment 480 968 83 979 1788.0 1535.0 0.003763 1.164821 1.585744
10 areas 477 1009 87 1037 1031.0 905.0 0.002219 1.139227 0.896928
11 field 464 933 89 956 435.0 398.0 0.000976 1.092965 0.426581
12 industry 442 988 83 1007 1089.0 899.0 0.002204 1.211346 0.909919
13 one 423 898 79 956 2199.0 1842.0 0.004516 1.193811 2.051225
14 delivery 382 713 76 747 975.0 760.0 0.001863 1.282895 1.065919
15 use 363 709 63 719 1007.0 866.0 0.002123 1.162818 1.221439
16 C 358 883 45 947 1099.0 665.0 0.001630 1.652632 0.753114
17 ability 355 629 82 631 2493.0 1874.0 0.004594 1.330309 2.979332
18 project management 355 582 81 589 171.0 151.0 0.000370 1.132450 0.259450
19 implementation 338 621 65 644 534.0 431.0 0.001057 1.238979 0.694042
20 projects 336 668 67 688 1579.0 1011.0 0.002479 1.561820 1.513473
21 planning 321 557 88 562 598.0 489.0 0.001199 1.222904 0.877917
22 teams 319 629 76 636 759.0 601.0 0.001473 1.262895 0.955485
23 maintenance 311 615 49 645 766.0 496.0 0.001216 1.544355 0.806504
24 testing 301 587 51 649 560.0 369.0 0.000905 1.517615 0.628620
25 Experience 298 629 65 635 3498.0 2232.0 0.005472 1.567204 3.548490
26 area 286 550 82 559 1800.0 1513.0 0.003709 1.189689 2.750909
27 SQL 278 555 51 579 550.0 323.0 0.000792 1.702786 0.581982
28 selling 271 705 50 729 319.0 244.0 0.000598 1.307377 0.346099
29 manufacturing environment 267 527 51 535 72.0 69.0 0.000169 1.043478 0.130930
30 marketing 265 487 68 495 598.0 420.0 0.001030 1.423810 0.862423
31 analysis 265 443 67 445 687.0 447.0 0.001096 1.536913 1.009029
32 HTML 256 532 49 539 288.0 212.0 0.000520 1.358491 0.398496
33 aspects 254 511 65 522 589.0 542.0 0.001329 1.086716 1.060665
34 more 241 536 60 577 1621.0 1395.0 0.003420 1.162007 2.602612
35 managing 239 387 63 397 1041.0 848.0 0.002079 1.227594 2.191214
36 business 237 457 65 475 4583.0 2684.0 0.006580 1.707526 5.873085
37 years 233 521 74 535 1728.0 1449.0 0.003552 1.192547 2.781190
38 Java 233 464 44 480 388.0 188.0 0.000461 2.063830 0.405172
39 training 231 450 64 458 3146.0 2351.0 0.005764 1.338154 5.224444
40 any 230 454 56 465 2130.0 1520.0 0.003726 1.401316 3.348018
41 staff 229 388 68 391 2857.0 1875.0 0.004597 1.523733 4.832474
42 systems 227 354 56 364 1309.0 860.0 0.002108 1.522093 2.429379
43 SQL Server 225 536 37 554 306.0 184.0 0.000451 1.663043 0.343284
44 programming 223 399 38 411 196.0 148.0 0.000363 1.324324 0.370927
45 similar environment 220 464 51 466 58.0 58.0 0.000142 1.000000 0.125000
46 sector 220 452 60 454 541.0 481.0 0.001179 1.124740 1.064159
47 this 219 451 65 456 10786.0 5986.0 0.014675 1.801871 13.272727
48 software development 219 428 37 443 154.0 116.0 0.000284 1.327586 0.271028
49 administration 218 381 45 400 432.0 371.0 0.000910 1.164420 0.973753
In [99]:
skills = list(
(df_c
 .query('n_company >= 3')
 .query('ad_freq < 100')
).text
)
len(skills)
Out[99]:
8075
In [100]:
with open('skills.txt', 'w') as f:
    for skill in skills:
        print(skill, file=f)
In [120]:
n_max=1000
for a,b,c in zip(skills[:n_max:3],skills[1:n_max:3],skills[2:n_max:3]):
     print('{:<35}{:<35}{:<}'.format(a,b,c))
similar role                       role                               design
development                        following                          experience
management                         sales                              knowledge
environment                        areas                              field
industry                           one                                delivery
use                                C                                  ability
project management                 implementation                     projects
planning                           teams                              maintenance
testing                            Experience                         area
SQL                                selling                            manufacturing environment
marketing                          analysis                           HTML
aspects                            more                               managing
business                           years                              Java
training                           any                                staff
systems                            SQL Server                         programming
similar environment                sector                             this
software development               administration                     some
customer service                   technologies                       understanding
all                                recruitment                        Excel
CSS                                SAP                                similar position
building                           PHP                                writing
support                            financial services                 commercial environment
Oracle                             range                              IT
software                           installation                       JavaScript
teaching                           Windows                            track record
UK                                 a                                  variety
Sales                              Linux                              production
senior level                       C++                                manufacturing
exposure                           retail                             business development
tools                              .NET                               preparation
account management                 developing                         designing
work                               position                           ASP.NET
above                              construction                       team
sales environment                  monitoring                         similar
project                            telesales                          engineering
reporting                          coaching                           Project Management
techniques                         web development                    processes
repair                             setting                            products
commissioning                      application                        applications
minimum                            customers                          high volume
Financial Services                 configuration                      operation
Marketing                          client                             customer
MySQL                              databases                          Sage
integration                        XML                                construction industry
supervision                        problem                            engineering environment
Development                        care                               service
office environment                 finance                            Word
company                            people                             wide range
troubleshooting                    networking                         Design
working                            control                            SEO
procurement                        change                             accounts
equipment                          Windows Server                     forecasting
Python                             type                               data analysis
social media                       MVC                                Javascript
level                              budgeting                          services
delivering                         Active Directory                   sales role
SharePoint                         FMCG                               Microsoft
procedures                         Knowledge                          change management
maintaining                        Retail                             HR
research                           *                                  supervisory role
fault finding                      skills                             face
team management                    hospitality                        roles
deployment                         people management                  events
digital marketing                  Microsoft Office                   leadership
requirements                       order                              Customer Service
retail environment                 customer service environment       organisation
assessment                         SAS                                mentoring
process                            background                         CAD
operating                          food                               relationships
automotive industry                MS Office                          negotiation
audit                              product development                production environment
leading                            clients                            supporting
agency                             Photoshop                          financial services industry
food industry                      HTML5                              broad range
management level                   etc                                practice
AutoCAD                            documentation                      financial management
SSIS                               financial services sector          PPC
hands                              fields                             Citrix
technology                         budget management                  date
B2B sales                          line                               PR
payroll                            restaurant                         Web Services
banking                            business analysis                  Ruby
TSQL                               operations                         CRM
line management                    credit control                     mechanical engineering
report                             equivalent                         Exchange
market                             NHS                                compliance
JQuery                             insurance                          accounting
AJAX                               Business                           ASP.Net
jQuery                             number                             interpretation
implementing                       customer service role              auditing
Account Management                 sectors                            end
retail sector                      architecture                       manufacture
B2B environment                    healthcare                         Business Analyst
Cisco                              modelling                          manager
creation                           environments                       communications
consultancy                        part                               software testing
fast paced environment             management role                    unit testing
Finance                            agency environment                 more information
methodologies                      good understanding                 system
solutions                          TDD                                SSRS
safety                             specification                      marketing role
both                               fabrication                        children
professional services              FMCG environment                   relational databases
control systems                    servicing                          public sector
many                               stages                             inspection
Magento                            education                          pneumatics
manufacturing industry             performance                        print
cold calling                       corporate environment              mechanical design
IT industry                        build                              similar field
test                               Unix                               practices
assembly                           Visual Studio                      good knowledge
Project Manager                    belt                               logistics
contracts                          gas industry                       data management
familiarity                        database design                    Agile environment
risk management                    Microsoft SQL Server               server
web technologies                   distribution                       Business Development
site                               property                           relation
call centre environment            WPF                                catering
communication                      installing                         quality
security                           Outlook                            administrative role
communication skills               travel industry                    frameworks
media                              food manufacturing environment     ecommerce
hydraulics                         technical support                  them
building services                  Sage Line                          related field
application development            similar industry                   Apache
levels                             websites                           stock control
financial sector                   two                                Agile
focus                              validation                         scripting
.Net                               Drupal                             telecoms
welding                            customer services                  WCF
provision                          budgets                            Microsoft Excel
evidence                           performance management             stakeholder management
advertising                        consulting                         buying
Spring                             Management                         VB.NET
quality assurance                  Fanuc                              IIS
Engineering                        MS SQL Server                      CSS3
education sector                   contract management                PLC
machinery                          VMware                             industries
insurance industry                 web services                       online marketing
commercial                         aerospace industry                 professional services environment
telephone                          XHTML                              health
Telesales                          Quality Assurance                  pharmaceutical industry
field sales                        strategy                           other areas
web design                         example                            telemarketing
Perl                               machining                          Understanding
full development lifecycle         framework                          either
direct marketing                   CMS                                software design
email                              supply chain                       project delivery
Gas industry                       packaging                          Oil
coding                             purchase ledger                    Google Analytics
recruitment industry               ISO                                Ability
audits                             hotel                              languages
staff management                   Solidworks                         components
full software development lifecyclefull project lifecycle             passion
capacity                           analytics                          leading teams
cleaning                           purchasing                         detail
product                            leisure                            social care
IFRS                               SSAS                               J****EE
previous experience                methods                            quality management systems
Business Analysis                  manufacturing sector               optimisation
interest                           Flash                              Insurance
evaluation                         related industry                   day
emphasis                           nuclear industry                   time
Hospitality                        Quantity Surveyor                  Access
Health                             water industry                     tuning
medium                             ERP systems                        Gas
similar roles                      Planning                           automotive sector
pricing                            benefits                           industrial environment
Business Objects                   technical environment              platforms
project planning                   injection moulding                 travel
motor industry                     media sales                        B2B
business environment               UNIX                               water
UML                                financial environment              SOA
large organisation                 PowerPoint                         Software
automation                         IP                                 fundraising
diary management                   handling                           degree
qualification                      schools                            Assistant Manager
costing                            CV                                 private sector
contractors                        ITIL                               most
VB.Net                             Android                            staff supervision
desire                             aerospace                          Aerospace industry
execution                          project work                       Recruitment
CRM systems                        year                               concept
EMC                                similar level                      Banking
process improvement                housing                            bar
web                                Key Stage                          Change Management
Aerospace                          call centre                        virtualisation
verification                       Credit Control                     department
medical devices                    presales                           developer
Telecoms                           FEA                                MS Project
disciplines                        school                             controls
Siemens                            years experience                   fashion
project manager                    requirements gathering             man management
meeting                            support role                       responsibility
team leadership                    Automotive industry                motivating
tooling                            milling                            Accounts
Operations                         publishing                         KS
management accounts                risk                               care sector
web applications                   digital                            financial reporting
Hibernate                          ERP                                product management
VMWare                             above areas                        scheduling
excellent communication skills     presentation                       estimating
ASP                                development role                   main contractor
electronics                        contract                           dementia
Advertising                        students                           community
completion                         successful delivery                construction sector
facilities management              these                              addition
running projects                   refurbishment                      senior role
least *                            standards                          Firewalls
coordination                       MS Excel                           supply chain management
regulated environment              Financial Services sector          full range
editing                            activities                         building relationships
invoicing                          market research                    supervisory level
Heidenhain                         General Manager                    further information
front                              manipulation                       email marketing
Administration                     new business development           wide variety
business planning                  Logistics                          business sales
energy                             test tools                         strong understanding
utilities                          Facilities Management              QA
asset management                   database management                negotiating
technical sales                    hotels                             major projects
Microsoft technologies             fine dining                        digital agency
pumps                              Testing                            Financial Services industry
firm                               marketing environment              mobile
other sectors                      database development               DNS
installations                      financial analysis                 success
Purchase Ledger                    quality systems                    supplier management
telecoms industry                  strong knowledge                   turning
issues                             SAGE                               recruiting
elderly care                       cost control                       retail sales
Chef de Partie                     Construction                       property management
Sharepoint                         guests                             development environment
performance tuning                 product design                     candidate
property sector                    Scripting                          3D modelling
selection                          Projects                           data
markets                            programme management               Microsoft Project
pitching                           energy sector                      Hyperion
motor trade                        start                              law firm
Head Chef                          Procurement                        Agile methodologies
leadership role                    financial control                  months
insurance sector                   Microsoft Exchange                 Manager
SQL databases                      ETL                                Prince
majority                           eCommerce                          reading
Payroll                            consultant                         Maintenance
R                                  Wordpress                          content management systems
firewalls                          testing tools                      Safety
busy environment                   content                            instrumentation
campaigns                          proposals                          machines
process mapping                    contracting                        TS
financial modelling                protocols                          Promotions
relevant field                     recovery                           Ajax
mental health                      managerial role                    digital media
Installation                       Virtualisation                     LAMP
Visual Basic                       JSON                               discipline
bids                               financial industry                 LAN
Sous Chef                          VB                                 systems development
web analytics                      NPD                                ERP system
Salesforce                         learning                           financial services environment
MS Word                            plumbing                           supervising
nursing                            above duties                       implementations
Manufacturing                      food manufacturing                 FMCG industry
process design                     ecommerce environment              Automotive
TUPE                               hardware                           Public Sector
excel                              financial accounting               Switches
internal audit                     Illustrator                        driving
tendering                          sourcing                           Account Manager
data entry                         administration role                contact centre environment
design patterns                    lean manufacturing                 certification
legal secretary                    BI                                 fault
management accounting              project management role            involvement
processing                         SOAP                               using
rail industry                      telecommunications                 Call Centre
promotions                         hospitality industry               restaurants
Sales Manager                      agile environment                  professional environment
residential                        SQL server                         collections
Microsoft Word                     analytical role                    agencies
report writing                     PA                                 Industry
VBA                                policies                           interpersonal skills
setup                              strategic planning                 business process
release                            charity sector                     sales ledger
trading                            engineering industry               estate agency
Networking                         commercial sector                  technical role
technology sector                  partnership                        Ethernet
copywriting                        negotiations                       review
Infrastructure                     Java development                   Project
expertise                          Head                               materials
cash handling                      detailed design                    Risk
FMCG manufacturing environment     assessing                          Content Management Systems
Degree                             PPM                                solution
operating systems                  Supply Chain                       property industry
combination                        Mazak                              support environment
storage                            Audit                              Office
cost                               Selenium                           Crystal Reports
Cognos                             Lean Manufacturing                 iOS
motors                             version control systems            systems integration
JUnit                              Microsoft products                 senior management level
service delivery                   healthcare sector                  retailer
SEN                                social housing                     drives
full lifecycle                     highways                           appreciation
governance                         Project Management role            internal sales
brand                              job                                Engineer
managerial level                   essential                          Internet technologies
merchandising                      Insurance industry                 configuring
London                             accessories                        software engineering
Business Intelligence              C development                      Risk Management
Sigma                              infrastructure                     curriculum development
layout                             Digital                            Software Development
continuous improvement             debugging                          migration
English                            application support                HACCP
engineering role                   network                            programmes
programming languages              WAN                                scripting languages
Jira                               SVN                                families
office administration              post                               related role
retail industry                    types                              CNC
full marketing mix                 domiciliary care                   ISO9001
engineering design                 executing                          full development life cycle
organising                         Year                               care setting
lead generation                    systems design                     web application development
following skills                   Recruitment Consultant             financial markets
JSP                                Utilities                          Entity Framework
banqueting                         investment banking                 Estimating
systems analysis                   adults                             Building Services
TFS                                Team Leader                        employee relations
high volume manufacturing environmentmedia industry                     other
minute                             office                             switches
Ecommerce                          structures                         CCTV
ITIL environment                   VoIP                               healthcare industry
plant                              rail                               RDBMS
running                            Oil Gas                            SME
contract negotiation               high level                         spreadsheets
information                        pensions                           contract negotiations
elements                           interpreting                       record
awareness                          Teaching Assistant                 retail background

Analysis

Look up a skill

Cooccurance would be great for understanding skills!

In [102]:
def filter_ents(query, exact=False, match_case=True):
    if exact and match_case:
        return df_ents[df_ents.text == query]
    elif exact:
        return df_ents[df_ents.text.str.lower() == query.lower()]
    else:
        return df_ents[df_ents.text.str.contains(fr'\b{query}\b', flags = 0 if match_case else re.IGNORECASE)]
In [103]:
def show_exp(query, exact=True, match_case=True, n_max=10):
    showent_df(filter_ents(query, exact, match_case)[:n_max])
In [104]:
def job_exp(query, exact=True, match_case=True):
    return filter_ents(query, exact, match_case).drop_duplicates('docidx')[['Company', 'Title']]
In [105]:
def related_experience(query, exact=True, match_case=True):
    return (
     df_ents[df_ents['docidx'].isin(filter_ents(query, exact, match_case).docidx.to_numpy())]
     .query('label == "EXPERIENCE"')
     .groupby('text')
     .agg(n=('text', 'count'),
      ads = ('docidx', 'nunique'),
      advertisers = ('Company', 'nunique'),
     )
  .query('advertisers > 1')
  .sort_values(['advertisers', 'ads', 'n'], ascending=False)
 )

"Experience" is a result of bad parsing.

It looks like these were probably lists that have had the list items stripped away. We could probably do something here to improve the sentence boundary detection.

In [106]:
show_exp('Experience', n_max=5)
A Degree or post graduate qualification Commercial experience in Software Testing (Manual and Automated Testing) Experience EXPERIENCE of software applications for some or all of version control, defect tracking, test case management, test suite automation Experience in the following:
Understanding and experience of Hibernate persistence technology Knowledge/experience of Eclipse RCP/OSGi/JFace Experience EXPERIENCE of integrating open source tools and libraries
Duties: Supervise the work of other Social Workers and Case files of complex caseloads Working as a member of a team to deliver a needsled professional Social Work service to children, young persons and families complying with legislative requirements and in accordance with established best practice Undertake case work ensuring that appropriate case records are rigorously maintained Contributes to all aspects of the work of their allocated team Actively seeks to support the work of the Team Manager Requirements: Qualified Social Worker Experience in Supervising other team members Experience EXPERIENCE working within Children and Families Experience of working in a Family Assessment and Safeguarding Team To discuss this role or any other Qualified Social Worker jobs in the West Midlands Area or in the Social Work field
Candidates must have the following skills/experience in order to be considered for this role: 5 years IT experience Experience of business systems analysis, business process and solution related projects Experience EXPERIENCE of implementing effective information systems Experience of management and financial accounts Project Management experience Knowledge of the software development cycle
Vast experience in Domiciliary care, preferably as a branch manager or similar Experience EXPERIENCE of developing services
In [107]:
show_exp('sales', n_max=5)
Key responsibility for clarifying service contract content to ensure maximum profit Ensure highest target margins are being achieved through negotiation focus price and cost effective contract conditions Close followup from start of negotiation until handover to the Service Division Representation of the Service Division in cross functional negotiation teams Develop project specific, winwin solutions in order to match customer s business case and our client s service goals Professional presentation of service contract quotations including proper cost benefit arguments Key Skills Service contract management and sales engineering experience in a capital goods industry is advantageous International experience of the target markets would be a distinct advantage Experience with sales EXPERIENCE to utilities is an asset Experience from
Have strong industry experience delivering mobile solutions in at least two industries Existing experience in sales EXPERIENCE and mobility, with an ability to harness and maintain relationships with clients.
The successful candidate will have the following: Experience in sales EXPERIENCE of digital technology solutions and systems Good understanding of convergence of ICT, digital communications technology and digital media and emerging trends Experience of the complete sales process from finding and qualifying sales opportunities, preparation of proposals, etc
Requirements: Fluency in English and German with Czech or Polish preferred Solid experience in sales EXPERIENCE or telesales is essential
You'll ideally bring proven sales experience, including sales EXPERIENCE interviewing skills and a track record in a target driven environment.
In [108]:
related_experience('sales').head(10)
Out[108]:
n ads advertisers
text
sales 1195 1176 513
customer service 138 132 56
marketing 106 104 53
business development 46 42 26
retail 64 64 24
account management 30 30 20
telesales 30 30 20
promotions 56 56 19
hospitality 40 40 14
recruitment 21 21 14
In [109]:
show_exp('project management', n_max=5)
Registered A minimum of 3 years team management experience within children's services Experience of project management EXPERIENCE and service development
Ensure organisational alignment Significant experience in strategic execution and project management EXPERIENCE in the insurance, financial services or Consultancy Industry
Ensure organisational alignment Significant experience in strategic execution and project management EXPERIENCE in the insurance, financial services or Consultancy Industry
Issue date required] Substantial recent experience of project management EXPERIENCE of building schemes as client or technical manager, preferably within a similar role and ideally within a public sector organisation Excellent understanding of building construction maintenance Good understanding of building contract processes, including partnering Microsoft Office skills Other preferable/desirable details to include on your CV, if applicable : Any local authority experience
a strong advantage Minimum of 4 years experience working in the asbestos industry Contract Management experience is highly advantageous Experience of project management EXPERIENCE is also desirable Strong communication skills
In [110]:
related_experience('project management').head(15)
Out[110]:
n ads advertisers
text
project management 589 582 355
design 24 23 18
delivery 24 21 15
development 17 17 10
management 10 10 9
managing 10 10 9
implementation 10 9 7
planning 8 8 7
building 8 8 6
customer service 12 12 5
price control environment 9 9 5
projects 9 9 5
business development 7 7 5
more 16 6 5
construction 6 6 5
In [111]:
filter_ents('price control environment')
Out[111]:
text docidx start end label sent_start sent_end Id Title FullDescription LocationRaw LocationNormalized ContractType ContractTime Company Category SalaryRaw SalaryNormalized SourceName split
34533 price control environment 47690 250 253 EXPERIENCE 244 257 68580067 Regulatory Analyst Assessing quantitatively the impact on shareho... Berkshire Berkshire NaN permanent NaN Consultancy Jobs 40000 - 45000 42500.0 michaelpage.co.uk Train
34539 price control environment 47692 270 273 EXPERIENCE 235 274 68580069 Regulatory Manager A fantastic opportunity has arisen for a Regul... Berkshire Berkshire NaN permanent NaN Consultancy Jobs 50000 - 60000 55000.0 michaelpage.co.uk Train
139178 price control environment 183711 414 417 EXPERIENCE 379 418 71631376 Regulatory Manager We are the UK s biggest water and sewerage com... Reading, Berkshire Reading NaN permanent Reed Engineering Jobs 45000 - 50000/annum depending on experience + ... 47500.0 cv-library.co.uk Train
187417 price control environment 242918 328 331 EXPERIENCE 322 335 72689668 Regulatory Analyst We are the UK ****;s biggest water and sewerag... Reading Reading NaN permanent Reed Consulting Accounting & Finance Jobs 40,000 to 43,000 41500.0 jobsite.co.uk Train
212607 price control environment 275788 229 232 EXPERIENCE 223 232 71680024 Regulatory Economist **** package We have an excellent opportunity for a Regulat... Reading Berkshire South East Reading NaN permanent Jonathan Lee Engineering & Manufacturing Accounting & Finance Jobs NaN NaN totaljobs.com Valid
288871 price control environment 375027 422 425 EXPERIENCE 416 429 71557474 Regulatory Analyst We are the UK s biggest water and sewerage com... Reading, Berkshire Reading NaN permanent Reed Other/General Jobs NaN NaN cv-library.co.uk Test
293650 price control environment 380912 228 231 EXPERIENCE 222 231 71745569 Regulatory Economist **** , **** , **** package We have an excellent opportunity for a Regulat... Reading,Berkshire UK NaN permanent Jonathan Lee Recruitment Product Eng Energy, Oil & Gas Jobs NaN NaN renewablescareers.com Test
310855 price control environment 401801 641 644 EXPERIENCE 606 645 72479775 Regulatory Manager What is the purpose of the role? You will be r... Reading Reading NaN permanent Thames Water Utilities Ltd Consultancy Jobs NaN NaN jobsite.co.uk Test
310859 price control environment 401803 670 673 EXPERIENCE 664 677 72479777 Regulatory Analyst What is the purpose of the role? You will have... Reading Reading NaN permanent Thames Water Utilities Ltd Consultancy Jobs NaN NaN jobsite.co.uk Test
In [112]:
related_experience('price control environment')
Out[112]:
n ads advertisers
text
price control environment 9 9 5
project management 9 9 5
economic regulatory policy development 7 7 3
economic regulatory price controls 3 3 3
economic regulatory price control 3 3 2
finance role 2 2 2
In [113]:
related_experience('AJAX').head(15)
Out[113]:
n ads advertisers
text
AJAX 89 89 60
CSS 33 33 21
HTML 28 28 20
JavaScript 21 21 17
Javascript 17 17 12
PHP 11 11 8
jQuery 9 9 7
design 7 7 7
IBM DB 11 11 6
Java 7 7 6
XML 7 7 5
Hibernate 5 5 4
MVC 5 5 4
Web Services 5 5 4
JSPs 4 4 4
In [114]:
related_experience('Java').head(15)
Out[114]:
n ads advertisers
text
Java 480 464 233
C++ 75 75 41
C 77 67 41
SQL 27 26 15
JavaScript 24 24 15
J****EE 20 20 15
Linux 17 17 15
Spring 18 17 14
development 19 16 14
HTML 19 19 13
Hibernate 14 13 11
experience 12 12 11
Python 14 14 10
highlevel language 14 14 10
hightraffic systems 14 14 10
In [115]:
related_experience('Python').head(15)
Out[115]:
n ads advertisers
text
Python 141 137 88
Perl 24 24 12
Ruby 18 18 12
Java 14 14 10
C 13 13 9
Bash 15 15 7
Django 8 8 7
PHP 14 14 6
OpenFrameworks 7 7 6
etc 7 7 6
Linux 7 7 5
Hadoop 6 6 5
expertise 5 5 5
Experience 8 8 4
C++ 5 5 4
In [116]:
related_experience('C++').head(15)
Out[116]:
n ads advertisers
text
C++ 337 313 151
C 163 126 65
Java 76 75 41
stages 16 16 12
development 13 13 11
highlevel language 14 14 10
hightraffic systems 14 14 10
this 12 12 9
MFC 25 18 8
Linux 13 12 8
TDD 29 17 6
C. 8 8 6
VB6 6 6 6
programming 6 6 6
all 25 25 5
In [117]:
related_experience('Javascript').head(15)
Out[117]:
n ads advertisers
text
Javascript 159 159 86
HTML 83 82 50
CSS 60 59 34
AJAX 17 17 12
PHP 16 14 10
JQuery 12 12 8
experience 9 9 8
Ajax 15 15 7
Java 8 7 7
Linux 15 15 6
MySQL 9 9 6
ASP 5 5 5
VB6 13 13 4
IIS 5 5 4
SQL 5 5 4
In [118]:
filter_ents('IBM DB')
Out[118]:
text docidx start end label sent_start sent_end Id Title FullDescription LocationRaw LocationNormalized ContractType ContractTime Company Category SalaryRaw SalaryNormalized SourceName split
1449 IBM DB 2647 311 313 EXPERIENCE 309 313 55409877 Java J****EE Developer ****k ****k Music, F... Java J****EE Developer ****k ****k Music, F... London London full_time permanent JOBG8 IT Jobs Up to 50,000 per year + 40000.00-50000.00 50000.0 planetrecruit.com Train
2757 IBM DB 4408 302 304 EXPERIENCE 289 304 61811863 Java J****EE Developer – ****k ****k Music, ... NEW Java J****EE Developer – ****k ****k Mu... London South East South East London NaN permanent Parham Consulting IT Jobs From 40,000 to 50,000 per annum 40,000 - 50,00... 45000.0 cwjobs.co.uk Train
12054 IBM DB 17428 133 135 EXPERIENCE 130 135 66925434 Application/Integration Developer We are looking for an experienced developer (*... Nottingham, Nottinghamshire Nottingham NaN permanent Seismic Group IT Jobs 35000 - 40000/annum 37500.0 cv-library.co.uk Train
33948 IBM DB 46689 301 303 EXPERIENCE 288 303 68567721 Java J****EE Developer ****k ****k Music, F... Java J****EE Developer ****k ****k Music, F... City of London - London The City full_time permanent London4Jobs IT Jobs 40000-50000 45000.0 london4jobs.co.uk Train
51186 IBM DB 68754 301 303 EXPERIENCE 288 303 68799489 Java J****EE Developer ****k ****k Music, F... Java J****EE Developer ****k ****k Music, F... Central London Central London full_time permanent Parham Consulting Ltd IT Jobs 40000.00 - 50000.00 GBP Annual 45000.0 jobs.newstatesman.com Train
51865 IBM DB 69555 311 313 EXPERIENCE 297 313 68806243 NEW Java J****EE Developer ****k ****k Mus... NEW Java J****EE Developer ****k ****k Mus... London,Euston,Kings Cross London NaN permanent Parham Consulting Ltd IT Jobs 40K - 50K + bonus, bens 45000.0 jobsite.co.uk Train
72518 IBM DB 95902 137 139 EXPERIENCE 134 139 69222789 Application/Integration Developer We are looking for an experienced developer (*... NOTTINGHAM Nottingham full_time permanent Seismic Recruitment IT Jobs From 35,000 to 40,000 per year 37500.0 fish4.co.uk Train
91450 IBM DB 120304 312 314 EXPERIENCE 298 314 69895464 NEW Java J****EE Developer ****k ****k Mus... NEW Java J****EE Developer ****k ****k Mus... London London full_time permanent PARHAM CONSULTING LIMITED IT Jobs From 40,000 to 50,000 per year + 40K - 50K + d... 45000.0 planetrecruit.com Train
103043 IBM DB 136377 301 303 EXPERIENCE 288 303 70322570 Java J****EE Developer ****k ****k Music, F... Java J****EE Developer ****k ****k Music, F... UK UK NaN permanent Parham Consulting Ltd IT Jobs 40000-50000 45000.0 fish4.co.uk Train
135503 IBM DB 179068 260 262 EXPERIENCE 247 262 71558569 Java J****EE Developer ****k****k Music, Film... Java J****EE Developer ****k ****k Music, F... London Greater London London NaN permanent NaN IT Jobs 50000 50000.0 technojobs.co.uk Train
148943 IBM DB 196178 302 304 EXPERIENCE 289 304 71810075 Java J****EE Developer Music/TV NEW Java J****EE Developer ****k ****k Mus... City of London - London The City full_time permanent UKStaffsearch IT Jobs 40000 - 50000 45000.0 ukstaffsearch.com Train
190258 IBM DB 246351 309 311 EXPERIENCE 307 311 66076642 Java J****EE Developer ****k ****k Music, Fi... Java J****EE Developer ? ****k ****k Music, ... LONDON London full_time permanent PARHAM CONSULTING LIMITED IT Jobs NaN NaN fish4.co.uk Valid
191233 IBM DB 247714 253 255 EXPERIENCE 243 255 66983814 Experienced Java J****EE Developer with Java, ... Experienced Java J****EE Developer with Java, ... London, UK London NaN permanent NaN IT Jobs NaN NaN theitjobboard.co.uk Valid
239297 IBM DB 310471 45 47 EXPERIENCE 40 47 68627859 DataStage designer DataStage designer Experienced required: Exper... Middlesex UK NaN contract Mpower Plus UK Ltd IT Jobs NaN NaN jobserve.com Test
267107 IBM DB 346007 310 312 EXPERIENCE 308 312 69895611 Java J****EE Developer ****k ****k Music, Fi... Java J****EE Developer – ****k ****k Music, ... London London full_time permanent PARHAM CONSULTING LIMITED IT Jobs NaN NaN planetrecruit.com Test
In [119]:
for ad in [ads[2647], ads[4408]]:
    print(ad + '\n')
Java J****EE Developer  ****k  ****k  Music, Film & TV  London Java J****EE Developers required for software house with client sectors of music, film and TV. Salary: Maximum ****: Discretionary bonus and benefits package. Location: Near Euston and King's Cross, London THE COMPANY: Consistent new business wins for the world leader in the provision of software solutions to the Music and Entertainment industry has given rise to the need for an experienced Java Developer. The working environment here is very pleasant with a casual dress code, laid back and friendly atmosphere, but also hardworking and dynamic with the autonomy to drive your job role forward. This is predominantly a development role, but you will be involved in the full product life cycle including design and clientfacing duties, so they need a good allrounder. EXPERIENCE REQUIRED: The experience required for this role is as follows:  A minimum of 5 years experience in the development of web applications for the J****EE development platform.  A minimum of 5 years experience in Java  Strong knowledge in all of JSP, Servlet, JDBC, JavaScript, SQL and HTML technologies.  Good knowledge of CSS, XML and DHTML  A personality suited to clientfacing situations  good communication skills.  A good standard of written English The above experience is essential. You require all of the above experience in order for to be eligible for this role. The following experience is desirable, though not essential:  Knowledge of the WebSphere development environment and Application Server.  Knowledge of and experience with AJAX (Asynchronous JavaScript XML)  Experience with IBM DB**** THE ROLE: This is a full SDLC role: You will be involved in all stages of the software development cycle from requirements gathering and specification through development, implementation, QA and support. You'll be involved in different technologies across the board from Front Office to Back Office. Please note there is no Spring or Hibernate: The company have instead developed their own inhouse frameworks. THE OPPORTUNITY Why work here? As for prospects, where you can take this role is flexible, as the role entails a wide remit across most aspects of development. Therefore, if you wish, you could become more clientfacing and progress to what is essentially a business analyst role, or you may wish to specialise more on the technical side of things and push the boundaries of the technology. This is a central role that essentially can take off in any direction. Here, you will have enough autonomy to define your own role. Therefore, if you take the initiative you can shape your role for the future and drive your own progression. Overall, this is a lovely place to work  it's a privatelyowned company and feels more like a family company, not at all institutionalised  everyone has a stake, everyone has a say. Being music, entertainment and film it's an interesting industry to work in too, with a wide range of clients both local and foreign. Location: Near Euston and King's Cross, London

NEW  Java J****EE Developer – ****k  ****k  Music, Film TV  London Java J****EE Developers required for software house with client sectors of music, film and TV. Salary: Maximum ****: Discretionary bonus and benefits package. Location: Near Euston and King’s Cross, London THE COMPANY: Consistent new business wins for the world leader in the provision of software solutions to the Music and Entertainment industry has given rise to the need for an experienced Java Developer. The working environment here is very pleasant with a casual dress code, laid back and friendly atmosphere, but also hardworking and dynamic with the autonomy to drive your job role forward. This is predominantly a development role, but you will be involved in the full product lifecycle including design and clientfacing duties, so they need a good allrounder. EXPERIENCE REQUIRED: The experience required for this role is as follows: A minimum of 5 years experience in the development of web applications for the J****EE development platform. A minimum of 5 years experience in Java Strong knowledge in all of JSP, Servlet, JDBC, JavaScript, SQL and HTML technologies. Good knowledge of CSS, XML and DHTML A personality suited to clientfacing situations  good communication skills. A good standard of written English The above experience is essential. You require all of the above experience in order for to be eligible for this role. The following experience is desirable, though not essential: Knowledge of the Websphere development environment and application server. Knowledge of and experience with AJAX (Asynchronous JavaScript XML) Experience with IBM DB**** THE ROLE: This is a full SDLC role: You will be involved in all stages of the software development cycle from requirements gathering and specification through development, implementation, QA and support. You'll be involved in different technologies across the board from front office to back office. Please note there is no Spring or Hibernate: The company have instead developed their own inhouse frameworks. THE OPPORTUNITY: Why work here? As for prospects, where you can take this role is flexible, as the role entails a wide remit across most aspects of development. Therefore, if you wish, you could become more clientfacing and progress to what is essentially a business analyst role, or you may wish to specialise more on the technical side of things and push the boundaries of the technology. This is a central role that essentially can take off in any direction. Here, you will have enough autonomy to define your own role. Therefore, if you take the initiative you can shape your role for the future and drive your own progression. Overall, this is a lovely place to work  it's a privatelyowned company and feels more like a family company, not at all institutionalised  everyone has a stake, everyone has a say. Being music, entertainment and film it's an interesting industry to work in too, with a wide range of clients both local and foreign. Location: Near Euston and King’s Cross, London This job was originally posted as www.cwjobs.co.uk/JobSeeking/JavaJ****EEDeveloper****k****kMusicFilmTVLondon_job****

Next Steps

We could keep building out a rule based approach:

  • Do analysis of this list to build up list of positive/negative phrases
  • Search the document for those phrases
  • Look at the results and build new rules to get those phrases

Or we could use this as the seed of a model based approach:

  • Build an NER model on these base phrases
  • Annotate the predictions and refine the model

Or we could use some hybrid of the two