⚡️ Saturday AI Sparks 🤖 - 🌐📝#️⃣ Translate → Summarize → Hashtags
Posted On: October 11, 2025
Description:
Introduction
AI tasks rarely happen in isolation. In real-world workflows, you often want to chain multiple AI steps together — for example, translate a text, summarize it, and then generate relevant hashtags for sharing.
In this post, we’ll show how to combine these three tasks into a simple end-to-end pipeline using scikit-learn’s Pipeline along with Hugging Face Transformers and Deep Translator.
Why This Matters
- Translation: Makes content globally accessible.
- Summarization: Reduces long text into a short, digestible version.
- Hashtag Generation: Helps in discoverability when sharing on social platforms.
Instead of running these steps separately, we’ll tie them together with a single pipeline.
Step 1 — Translate
We use deep-translator to translate text into English before summarization.
This ensures the summarizer works on consistent input.
from deep_translator import GoogleTranslator
def translate_list(texts, source="auto", target="en"):
return [GoogleTranslator(source=source, target=target).translate(t) for t in texts]
Step 2 — Summarize
We apply a pretrained summarization model from Hugging Face (distilbart-cnn-12-6).
The summarizer shortens the text while preserving its meaning.
from transformers import pipeline
summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6")
def summarize_list(texts):
outs = []
for t in texts:
out = summarizer(t, max_length=120, min_length=40, do_sample=False)
outs.append(out[0]["summary_text"])
return outs
Step 3 — Generate Hashtags
We create simple hashtags by extracting frequent keywords from the summary.
No extra dependencies are needed — just Python’s built-in libraries.
import collections, string, re
STOPWORDS = {"the","and","is","in","to","of","a","for","on","it","as","with","this","that","by","an","be","are"}
def clean_tokens(text):
text = text.lower().translate(str.maketrans("", "", string.punctuation + string.digits))
return [w for w in text.split() if w not in STOPWORDS and len(w) > 2]
def hashtags_from_list(texts, top_k=8):
all_tags = []
for t in texts:
words = clean_tokens(t)
counter = collections.Counter(words)
most_common = [w for w, _ in counter.most_common(top_k)]
tags = ["#" + w.capitalize() for w in most_common]
all_tags.append(" ".join(tags))
return all_tags
Step 4 — Combine into a Pipeline
With scikit-learn’s Pipeline, we can chain all steps together into a single, reusable workflow.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer
pipe = Pipeline(steps=[
("translate_to_en", FunctionTransformer(lambda X: translate_list(X, source="auto", target="en"), validate=False)),
("summarize_en", FunctionTransformer(lambda X: summarize_list(X), validate=False)),
("hashtags", FunctionTransformer(lambda X: hashtags_from_list(X, top_k=8), validate=False)),
])
Running the pipeline on an input text returns both the summary and hashtags.
Sample Output
Input text (Spanish):
La inteligencia artificial está transformando industrias enteras, desde la salud hasta las finanzas.
Permite automatizar tareas, encontrar patrones complejos y ofrecer experiencias personalizadas a gran escala.
Summary (EN):
Artificial intelligence is transforming industries by automating tasks, finding
complex patterns, and enabling personalization at scale.
Hashtags (EN):
#Artificial #Intelligence #Industries #Automating #Tasks #Patterns #Personalization #Scale
Key Takeaways
- AI tasks like translation, summarization, and keywording can be chained into one workflow.
- scikit-learn’s Pipeline makes the process modular and reusable.
- Hashtags can be generated from simple frequency analysis — no heavy NLP needed.
- This lightweight workflow is practical for social content automation.
Code Snippet:
from deep_translator import GoogleTranslator
from transformers import pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
import re
import string
# Example multilingual input (replace with your own)
original_text = """
La inteligencia artificial está transformando industrias enteras, desde la salud hasta las finanzas.
Permite automatizar tareas, encontrar patrones complejos y ofrecer experiencias personalizadas a gran escala.
"""
src_lang = "auto" # detect automatically
pivot_lang = "en" # summarize in English
tgt_lang = "en" # change (e.g., "es", "fr", "de") to translate final outputs back
def translate_text(text: str, source: str, target: str) -> str:
"""
Translate `text` from `source` → `target` using GoogleTranslator (no API key).
Set source="auto" to auto-detect the input language.
"""
return GoogleTranslator(source=source, target=target).translate(text)
translated_en = translate_text(original_text, src_lang, pivot_lang)
print("=== Translated → English (preview) ===\n", translated_en[:400], "...\n")
# Build the summarization pipeline once (downloads model on first run)
summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6")
def summarize(text: str, max_chars: int = 1800) -> str:
"""
Summarize text. If it's very long, we truncate for demo purposes.
For production, chunk the text and summarize per chunk, then summarize the summaries.
"""
text = text.strip()
if len(text) > max_chars:
text = text[:max_chars]
out = summarizer(text, max_length=130, min_length=45, do_sample=False)
return out[0]["summary_text"].strip()
summary_en = summarize(translated_en)
print("=== Summary (EN) ===\n", summary_en, "\n")
def simple_clean(text: str) -> str:
"""Lowercase, remove punctuation/numbers, and collapse spaces."""
text = text.lower()
text = re.sub(r"http\S+|www\.\S+", " ", text) # remove URLs
text = text.translate(str.maketrans("", "", string.punctuation + string.digits))
text = re.sub(r"\s+", " ", text).strip()
return text
def top_keywords_tfidf(text: str, k: int = 8):
"""
Return top-k keywords using TF-IDF on a single document by splitting into sentences.
(Crude but effective for short social captions.)
"""
# Split into pseudo-documents (sentences) to let TF-IDF score terms
sentences = re.split(r"[.!?]\s+", text)
sentences = [s for s in sentences if s.strip()]
if not sentences:
sentences = [text]
vec = TfidfVectorizer(
stop_words="english",
ngram_range=(1, 2), # allow unigrams + bigrams
max_features=1000,
token_pattern=r"(?u)\b[a-zA-Z][a-zA-Z]+\b", # alphabetic tokens (≥2 letters)
)
X = vec.fit_transform(sentences)
# Aggregate scores across sentences
scores = X.sum(axis=0).A1
terms = vec.get_feature_names_out()
ranked = sorted(zip(terms, scores), key=lambda x: x[1], reverse=True)[:k]
return [w for w, _ in ranked]
def to_hashtags(words):
"""Convert keyword tokens/phrases to social-friendly #hashtags."""
tags = []
for w in words:
token = re.sub(r"\s+", "", w) # remove spaces for bigrams
token = re.sub(r"[^a-zA-Z0-9]", "", token)
if token:
tags.append("#" + token[:28]) # keep tags readable
# De-duplicate while preserving order
seen = set()
uniq = []
for t in tags:
if t.lower() not in seen:
uniq.append(t)
seen.add(t.lower())
return uniq
keywords = top_keywords_tfidf(simple_clean(summary_en), k=10)
hashtags_en = to_hashtags(keywords)
print("=== Hashtags (EN) ===\n", " ".join(hashtags_en), "\n")
def translate_summarize_hashtag(text: str, src="auto", pivot="en", tgt="en", k=10):
# 1) translate → English
en = translate_text(text, src, pivot)
# 2) summarize (EN)
summ_en = summarize(en)
# 3) hashtags from EN summary
kws = top_keywords_tfidf(simple_clean(summ_en), k=k)
tags_en = to_hashtags(kws)
# 4) optional translate back
out_summary = summ_en if tgt == "en" else translate_text(summ_en, "en", tgt)
_, out_tags = (tags_en, tags_en) if tgt == "en" else maybe_translate_list(tags_en, "en", tgt)
return out_summary, out_tags
demo_summary, demo_tags = translate_summarize_hashtag(original_text, src=src_lang, pivot=pivot_lang, tgt=tgt_lang, k=10)
print("=== DEMO SUMMARY ===\n", demo_summary, "\n")
print("=== DEMO HASHTAGS ===\n", " ".join(demo_tags))
No comments yet. Be the first to comment!