This is a quick real-world experiment with computer-based Natural Language Processing, demonstrating how it can be of use outside the very large corporations which have been driving its rapid development in the last few years.


The aim is to analyse the Bicycle Stack Exchange site to extract information of what the posters (in aggregate, no identity at all is used) may be interested in buying. In the dataset there are a total of 58,000 posts which could be manageable for a human to get a feel for in a day or two, but the aim is to develop an automatic way of summarising the sentiment which could be applied to a much larger dataset. A significant complicating factor of this dataset is that the majority of posts are not about buying at all hence understanding significance of processed output is important.

An analysis such as this could, for example, be used by a new manufacturer to identify potential products; or it could be used by physical shops to identify which new types of products could be of interest to cyclists.


I use the Bidirectional Encoder Representations from Transformers (BERT) large pre-trained model as available in the Python transformers package.

Each post on the Bicycle stack exchange is analysed in turn with the model transformers.BertForQuestionAnswering : the post itself is the input paragraph and the question in this case was simply: What should I buy?1 . To each answer a significance from the model is attached.

Dataset analysis is done using a Tesla K80 GPU and Google Colab – processing on even a high-end high core count CPU is substantially slower.


The top 200 results (ordered by significance) are shown below. They illustrate nicely both the limits and successes of computer-based Natural Language Processing:

  1. A significant fraction of the answers are not really answers to the question “What should I buy?” and not relevant to an analysis of purchasing sentiment – so the False Positive rate in this particular instance is significant

  2. There is however a great deal of useful information, in a very compact form – this is a summary of almost 8 million words of input data! Yet it is possible to scan the summary in less than five minutes and get a very good idea of what the posters are thinking of purchasing.

  3. In fact, even as a casual cyclists unconnected to the industry this NLP summary has given me some good ideas of what to buy (in case you are interested: “squirt dry lube”, “bungee cords”, “pedal extenders” – I had no idea this existed!). I’ll be giving the fixed gear a miss though.

All the results:

Answer Score
without replacing the cranks 3.984334
brake lever 3.987443
a multispeed wheel 3.992327
single - speed bike 3.998377
kickstand has been designed for those heavy loads 4.000814
6 - bolt or centre - lock rotor interface 4.002281
aluminum 4.007900
a single wheel bike trailer 4.007934
chris king or cane creek 4.013450
my bike is a racing bike - or - my bike has a … 4.014205
a 2 - wheeled cargo trailer 4.014745
cavitation 4.016710
1997 4.024745
25mm 4.036997
your bike does not fit properly 4.037504
shoes legwear ( rain ) coats face / helmet 4.044062
new asphalt or roadworks 4.054311
to get a really thick pair of gloves 4.055343
solvent 4.067034
kryptonite or abus thx 4.067524
30 seconds 4.067729
700x20 - 25c or 700x25 - 32c 4.069897
it is advantageous 4.071337
it can cause the chain to hit the chainstay an… 4.072789
bags for cell phones or very slim compact cameras 4.084034
sti levers 4.087434
a cheap pair of platform pedals 4.087685
low likelihood of forgetting you pack on your … 4.087863
a trailer or rack 4.087946
kickstarter kit by barak electric 4.089828
stabilising the fork so it does not wobble to … 4.092712
bicycle tow bar 4.093565
a little cylinder that runs on the front brake… 4.094686
to repaint 4.097400
9sp chainrings 4.112884
there is some personal preference from mechani… 4.121975
what lock 4.133030
use your front brake and push your rear end 4.141050
a new bike 4.142982
cateye micro wireless mc100w or cateye velo 7 4.148888
~ $ 480 4.149406
magura 4.158324
etrto code 4.165245
319 pounds 4.168193
tacx blue or satori 4.171396
24 speed carrera vengeance 4.176042
where would be the best place to look for this… 4.182470
a trailer bike 4.182586
google maps 4.188448
mountain bike 4.190742
backwards 4.196258
blackberry 4.198652
i have no idea where it’s from 4.202947
most stores only offer trailers sized for chil… 4.208519
training 4.211099
700x25c 4.211437
5, 000 miles 4.214664
strength or cardiovascular goals 4.219513
faster than their eventual main pace 4.228981
token tf24 or the token ninja tf37 4.240726
in a bladder 4.254608
a wheel 4.254874
mechanical 4.256982
a hex bolt 4.265576
beater steel 4.279584
in the spokes. if you don’t have a pannier or … 4.280002
mountain bike 4.290862
quite a bit 4.295190
rims 4.297310
pulling up or pushing down 4.298195
a helicoil kit 4.301330
if parking your own bike in that space makes i… 4.305521
rims that will fit cyclocross or road bike tires 4.311843
disk brakes 4.312477
screw clamps 4.333948
helmet mounted lights 4.340793
14 / 2mm 4.341318
new / better brakes 4.364946
back brake 4.375150
corrosion proof coating 4.381238
price, capacity and design 4.382625
mt - 40 4.383696
NaN 4.384270
cn - m981 cn - m980 cn - hg95 cn - hg94 cn - h… 4.391233
rest 4.400147
ergonomic “ locking grips 4.401465
replacing your old tire with another tire you … 4.403919
the external width just keeps shrinking 4.409768
torque wrench 4.424376
increases bb stiffness and durability, while r… 4.433124
too narrow for optimal grip for most people 4.436118
non - standard 4.455884
rotary plumbing pipe cutter 4.458583
a single speed bike 4.477700
a standard pull brake lever 4.480142
replacement shifters 4.493139
how would i know 4.506325
NaN 4.506698
crank arm 4.512474
extremely functional 4.516920
superglue or duct tape 4.522046
chain tensioners 4.524573
a really long seat post installed way too far up 4.536146
single spokes 4.539762
rear metal part of the saddle 4.546989
a crank tool 4.600437
flinging my leg over the saddle while the bike… 4.603375
a “ boot “ inside the tire 4.617736
a upright conventional bicycle 4.639482
advertising 4.659174
wired vs wireless 4.659637
woods ( dunlop ) valve 4.664250
high - end 4.664501
designating some car lanes as bike - only lanes 4.665493
funds and storage 4.668933
any tire of similar quality would be no more o… 4.677284
the distance between the tire and the fender 4.683753
vibrations that hurt my fingers 4.686222
mechanical or hydraulic 4.686745
a helmet with a face shield 4.703397
nowadays we don’t care about scratches “. but … 4.707049
hed tomcat disc 4.710005
a 54cm bike 4.715860
every 100km to 200km 4.716311
presta 4.725976
a bicycle 4.728568
kool stop thinlines 4.735795
they are lighter and smaller than bikes easier… 4.736367
double sided tape and some sort of neoprene or… 4.748918
sram / avid 4.751222
ride up grades 4.760292
chain stay length 4.768253
wider 4.779448
improving your tt or climbing speeds 4.779812
medium sandpaper 4.779857
700c 4.802800
don’t 4.817946
in a hydration pack on my back 4.842426
cartridge bottom brackets 4.849416
ball and rod 4.866506
day / night, foggy vs. direct sun, perhaps the… 4.872999
many other companies manufacture freehubs with… 4.891198
custom pedal extenders 4.920067
it could very easily sap the energy reserves f… 4.922484
your preferences and what kind of riding you w… 4.924358
maximum braking efficiency and better control … 4.931591
tubeless tyres 4.950227
disc brakes 4.963746
i can’t find info 4.970236
my bike is very sturdy 4.973813
” brake hard, then release completely & amp ; … 4.974698
degreaser 4.983543
ebay 4.986038
it depends 5.017685
60 mm between the jockey wheels 5.028659
as far forward 5.051633
sunblock 5.114271
$ 1500 5.120024
700c 5.128308
rolling the hips forward slightly 5.159712
26. 8 mm 5.160173
hot water 5.165559
a simple conversion 5.168078
castrol ep 90 5.184137
bungee cords 5.201746
the spinning mass of the wheel 5.233724
hardtail 29er 5.325470
paint job 5.327999
5 % less 5.331928
schraeder 5.334766
nut driver 5.349661
20 + hours a week 5.357866
local bike shops 5.364977
hybrid 5.377503
grit 5.387483
toasty 5.391614
puncture repair glue 5.401884
bike trousers 5.405810
non - park tool tool 5.408729
comfort, grip and possibly rolling resistance 5.426383
fixed gear 5.435682
internal - cam 5.438887
pinkbike 5.464226
watts 5.466736
a tarp 5.500462
merinos 5.509621
squirt dry lube 5.529767
within a couple feet 5.539470
does not drop as far into potholes, and genera… 5.545146
700x28c 5.551797
machines generally do a more precise job than … 5.557894
35 mph 5.581415
one speed chain 5.587879
a riding position that’s optimized for hard pe… 5.757022
much more rapid 5.757543
signage on the bike path saying bicycles are a… 5.778970
between dropout and skewer end 5.820508
new parts or adjustments 5.843324
autumn 5.848872
metal rod brakes 5.877050
a pressure gauge 5.978394
2000 miles 6.056773
700x35c 6.087281
56mm 6.097930
off - road or on dirty / wet roads 6.140635
change clothes at work 6.186455
because of the difference in distance travelled 6.192146
longest battery life 6.230284
shifting gears 6.242918
by using it 6.256811
hub, hub axis, bearings, spokdes 6.356838
100 % daylight ; no mechanical failures ; and,… 6.378467
safety reasons 6.620848
overactive 6.975286
melted down and the metal used for something new 7.1861
  1. Experimenting with different questions is informative but outside the scope of this initial post!